=Paper= {{Paper |id=Vol-1446/GEDM2015_proc |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM2015_proc.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/XueLC15 }} ==None== https://ceur-ws.org/Vol-1446/GEDM2015_proc.pdf

Graph-based Educational Data Mining (G-EDM 2015)

Collin F. Lynch Dr. Tiffany Barnes
Department of Computer Department of Computer
Science Science
North Carolina State North Carolina State
University University
Raleigh, North Carolina Raleigh, North Carolina
cflynch@ncsu.edu tmbarnes@ncsu.edu
Dr. Jennifer Albert Michael Eagle
Department of Computer Science Department of Computer
North Carolina State University Science
Raleigh, North Carolina North Carolina State
jlsharp@ncsu.edu University
Raleigh, North Carolina
mjeagle@ncsu.edu

1. INTRODUCTION Thus, graphs are simple in concept, general in structure, and
Fundamentally, a graph is a simple concept. At a basic level a have wide applications for Educational Data Mining (EDM).
graph is a set of relationships {e(n0 ,n2 ),e(n0 ,nj ),...,e(nj−1 ,nj )} Despite the importance of graphs to data mining and data anal-
between elements. This simple concept, however, has afforded the ysis there exists no strong community of researchers focused on
development of a complex theory of graphs [1] and rich algorithms Graph-Based Educational Data Mining. Such a community is
for combinatorics [7] and clustering [4]. This has, in turn, made important to foster useful interactions, share tools and techniques,
graphs a fundamental part of educational data mining. and to explore common problems.

Many types of data can be naturally represented as graphs such 2. GEDM 2014
as social network data, user-system interaction logs, argument This is the second workshop on Graph-Based Educational Data
diagrams, logical proofs, and forum discussions. Such data has Mining. The first was held in conjunction with EDM 2014 in
grown exponentially in volume as courses have moved online and London [17]. The focus of that workshop was on seeding an initial
educational technology has been incorporated into the traditional community of researchers, and on identifying shared problems, and
classroom. Analyzing it can help to answer a range of important avenues for research. The papers presented covered a range of top-
questions such as: ics including unique visualizations [13], social capital in educational
networks [8], graph mining [19, 11], and tutor construction [9].
• What path(s) do high-performing students take through online
educational materials? The group discussion sections at that workshop focused on the
• What social networks can foster or inhibit learning? distinct uses of graph data. Some of the work presented focused
• Do users of online learning tools behave as the system designers on student-produced graphs as solution representations (e.g. [14,
expect? 3]) while others focused more on the use of graphs for large-scale
• What diagnostic substructures are commonly found in student- analysis to support instructors or administrators (e.g. [18, 13]).
produced diagrams? These differing uses motivate different analytical techniques and,
• Can we use prior student data to identify students’ solution as participants noted, change our underlying assumptions about
plan, if any? the graph structures in important ways.
• Can we use prior student data to provide meaningful hints in
complex domains?
• Can we identify students who are particularly helpful based 3. GEDM 2015
upon their social interactions? Our goal in this second workshop was to build upon this nascent
community structure and to explore the following questions:

1. What common goals exist for graph analysis in EDM?
2. What shared resources such as tools and repositories are re-
quired to support the community?
3. How do the structures of the graphs and the analytical methods
change with the applications?

The papers that we include here fall into four broad categories:
interaction, induction, assessment, and MOOCs.
Work by Poulovassilis et al. [15] and Lynch et al. [12] focuses Data Mining 2014, co-located with 7th International
on analyzing user-system interactions in state based learning Conference on Educational Data Mining (EDM
environments. Poulovassilis et al. focuses on the analyses of 2014), London, United Kingdom, July 4-7, 2014., volume
individual users’ solution paths and presents a novel mechanism 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
to query solution paths and identify general solution strategies. [4] M. Girvan and M. E. J. Newman. Community
Lynch et al. by contrast, examined user-system interactions from structure in social and biological networks. Proc. of the
existing model-based tutors to examine the impact of specific National Academy of Sciences, 99(12):7821–7826, June 2002.
design decisions on student performance. [5] J. Guerra. Graph analysis
of student model networks. In C. F. Lynch, T. Barnes,
Price & Barnes [16] and Hicks et al. [6] focus on applying these J. Albert, and M. Eagle, editors, Proceedings of the Second
same analyses in the open-ended domain of programming. Unlike International Workshop on Graph-Based Educational Data
more discrete tutoring domains where users enter single equations Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
or select actions, programming tutors allow users to make drastic [6] A. Hicks, V. Catete, R. Zhi,
changes to their code on each step. This can pose challenges for Y. Dong, and T. Barnes. Bots: Selecting next-steps from
data-driven methods as the student states are frequently unique player traces in a puzzle game. In C. F. Lynch, T. Barnes,
and admit no easy single-step advice. Price and Barnes present a J. Albert, and M. Eagle, editors, Proceedings of the Second
novel method for addressing the data sparsity problem by focusing International Workshop on Graph-Based Educational Data
on minimal-distance changes between users [16] while in related Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
work Hicks et al. focuses on the use of path weighting to select [7] D. E. Knuth. The
actionable advice in a complex state space [6]. Art of Computer Programming: Combinatorial Algorithms,
Part 1, volume 4A. Addison-Wesley, 1st edition, 2011.
The goal in much of this work is to identify rules that can [8] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala.
be used to characterize good and poor interactions or good and What is the source of social capital? the association
poor graphs. Xue at al. sought address this challenge in part via between social network position and social presence
the automatic induction of graph rules for student-produced dia- in communities of inquiry. In S. G. Santos and O. C. Santos,
grams [22]. In their ongoing work they are applying evolutionary editors, Proceedings of the Workshops held at Educational
computation to the induction of Augmented Graph Grammars, Data Mining 2014, co-located with 7th International
a graph-based formalism for rules about graphs. Conference on Educational Data Mining (EDM
2014), London, United Kingdom, July 4-7, 2014., volume
The work described by Leo-John et al. [10], Guerra [5] and 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
Weber & Vas [21], takes a different tack and focuses not on graphs [9] R. Kumar. Cross-domain performance of automatic tutor
representing solutions or interactions but on relationships. Leo- modeling algorithms. In S. G. Santos and O. C. Santos,
John et al. present a novel approach for identifying closely-related editors, Proceedings of the Workshops held at Educational
word problems via semantic networks. This work is designed to Data Mining 2014, co-located with 7th International
support content developers and educators in examining a set of Conference on Educational Data Mining (EDM
questions and in giving appropriate assignments. Guerra takes 2014), London, United Kingdom, July 4-7, 2014., volume
a similar approach to the assessment of users’ conceptual changes 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
when learning programming. He argues that the conceptual
[10] R.-J. Leo-John, T. McTavish, and R. Passonneau.
relationship graph affords a better mechanism for automatic as-
Semantic graphs for mathematics word problems based
sessment than individual component models. This approach is
on mathematics terminology. In C. F. Lynch, T. Barnes,
also taken up by Weber and Vas who present a toolkit for graph-
J. Albert, and M. Eagle, editors, Proceedings of the Second
based self-assessment that is designed to bring these conceptual
International Workshop on Graph-Based Educational Data
structures under students’ direct control.
Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
And finally, Vigentini & Clayphan [20], and Brown et al. [2] [11] C. F. Lynch. AGG: augmented graph grammars for complex
focus on the unique problems posed by MOOCs. Vigentini and heterogeneous data. In S. G. Santos and O. C. Santos,
Clayphan present work on the use of graph-based metrics to editors, Proceedings of the Workshops held at Educational
assess students’ on-line behaviors. Brown et al., by contrast, focus Data Mining 2014, co-located with 7th International
not on local behaviors but on social networks with the goal of Conference on Educational Data Mining (EDM
identifying stable sub-communities of users and of assessing the 2014), London, United Kingdom, July 4-7, 2014., volume
impact of social relationships on users’ class performance. 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
[12] C. F. Lynch, T. W. Price,
M. Chi, and T. Barnes. Using the hint factory to analyze
4. REFERENCES model-based tutoring systems. In C. F. Lynch, T. Barnes,
[1] B. Bollobás.
J. Albert, and M. Eagle, editors, Proceedings of the Second
Modern Graph Theory. Springer Science+Business
International Workshop on Graph-Based Educational Data
Media Inc. New York, New York, U.S.A., 1998.
Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
[2] R. Brown, C. F. Lynch, Y. Wang,
[13] T. McTavish. Facilitating graph interpretation via interactive
M. Eagle, J. Albert, T. Barnes, R. Baker, Y. Bergner,
hierarchical edges. In S. G. Santos and O. C. Santos,
and D. McNamara. Communities of performance
editors, Proceedings of the Workshops held at Educational
& communities of preference. In C. F. Lynch, T. Barnes,
Data Mining 2014, co-located with 7th International
J. Albert, and M. Eagle, editors, Proceedings of the Second
Conference on Educational Data Mining (EDM
International Workshop on Graph-Based Educational Data
2014), London, United Kingdom, July 4-7, 2014., volume
Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
[3] R. Dekel and K. Gal. On-line plan recognition in exploratory
[14] B. Mostafavi and T. Barnes. Evaluation of logic proof problem
learning environments. In S. G. Santos and O. C. Santos,
difficulty through student performance data. In S. G. Santos
editors, Proceedings of the Workshops held at Educational
and O. C. Santos, editors, Proceedings of the Workshops Data Mining 2014, co-located with 7th International
held at Educational Data Mining 2014, co-located with 7th Conference on Educational Data Mining (EDM
International Conference on Educational Data Mining (EDM 2014), London, United Kingdom, July 4-7, 2014., volume
2014), London, United Kingdom, July 4-7, 2014., volume 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [19] K. Vaculı́k, L. Nezvalová, and L. Popelı́nsky. Graph mining
[15] A. Poulovassilis, S. G. Santos, and M. Mavrikis. Graph-based and outlier detection meet logic proof tutoring. In S. G. Santos
modelling of students’ interaction data from exploratory and O. C. Santos, editors, Proceedings of the Workshops
learning environments. In C. F. Lynch, T. Barnes, held at Educational Data Mining 2014, co-located with 7th
J. Albert, and M. Eagle, editors, Proceedings of the Second International Conference on Educational Data Mining (EDM
International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014., volume
Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
[16] T. Price and T. Barnes. An [20] L. Vigentini and A. Clayphan. Exploring the function
exploration of data-driven hint generation in an open-ended of discussion forums in moocs: comparing data mining
programming problem. In C. F. Lynch, T. Barnes, and graph-based approaches. In C. F. Lynch, T. Barnes,
J. Albert, and M. Eagle, editors, Proceedings of the Second J. Albert, and M. Eagle, editors, Proceedings of the Second
International Workshop on Graph-Based Educational Data International Workshop on Graph-Based Educational Data
Mining (GEDM 2015). CEUR-WS, June 2015. (in press). Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
[17] S. G. Santos and O. C. Santos, [21] C. Weber and R. Vas. Studio: Ontology-based
editors. Proceedings of the Workshops held at Educational educational self-assessment. In C. F. Lynch, T. Barnes,
Data Mining 2014, co-located with 7th International J. Albert, and M. Eagle, editors, Proceedings of the Second
Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data
2014), London, United Kingdom, July 4-7, 2014, volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [22] L. Xue, C. F. Lynch, and M. Chi. Graph grammar induction
[18] V. Sheshadri, C. Lynch, and T. Barnes. by genetic programming. In C. F. Lynch, T. Barnes,
Invis: An EDM tool for graphical rendering and analysis of J. Albert, and M. Eagle, editors, Proceedings of the Second
student interaction data. In S. G. Santos and O. C. Santos, International Workshop on Graph-Based Educational Data
editors, Proceedings of the Workshops held at Educational Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
Graph Grammar Induction via Evolutionary Computation

Linting Xue Collin F.Lynch Min Chi
Electrical Engineering Computer Science Computer Science
Department Department Department
North Carolina State North Carolina State North Carolina State
University University University
Raleigh, North Carolina, Raleigh, North Carolina, Raleigh, North Carolina,
U.S.A. U.S.A. U.S.A.
lxue3@ncsu.edu cflynch@ncsu.edu mchi@ncsu.edu

ABSTRACT In this paper we describe our ongoing work on the automatic
Augmented Graph Grammars provide a robust formalism induction of Augmented Graph Grammars via Evolutionary
for representing and evaluating graph structures. With the Computation (EC). Our long-term goal in this work is to
advent of robust graph libraries such as AGG, it has be- develop automated techniques that can extract empirically-
come possible to use graph grammars to analyze realistic valid graph rules which can, in turn, be used to classify
data. Prior studies have shown that graph rules can be used student-produced argument diagrams and to provide the ba-
to evaluate student work and to identify empirically-valid sis for automated student guidance and evaluation. This will
substructures using hand-authored rules. In this paper we build upon our prior on the evaluation of a-priori rules for
describe proposed work on the automatic induction of graph student arguments. We will begin with background material
grammars for student data using evolutionary computation on Augmented Graph Grammars and discuss prior work on
via the pyEC system. grammar induction. We will then present an overview of our
planned work.
Keywords
Augmented Graph Grammars, Graph Data, Evolutionary 2. AUGMENTED GRAPH GRAMMARS
Computation & ARGUMENT DIAGRAMS
Classical graph grammars are designed to deal with fixed
1. INTRODUCTION graphs that are composed from a finite set of static node and
Graph Grammars are logical rule representations for graph arc types. Augmented graph grammars are an extension of
structures. They can be designed to encode classes of suit- simple graph grammars that allow for complex node and arc
able graphs to recognize complex sub-features. They were types, optional substructures, and complex rule expressions
introduced by Rosenfeld and Pfaltz in 1969 as “Context-free [12]. Rather than using a fixed alphabet of graph compo-
web grammars” [1]. Since then Graph grammars have been nents they are defined by a complex ontology that allows for
applied to a wide range of areas, including pattern recogni- subsidiary types such as textual fields, access functions, and
tion [2, 3, 4]; visual programming languages [5]; biological directional information. They can also be used to evaluate
development [6]; classification of chemical compounds [7, 8]; negated elements as well as quantified expressions. As such
and social network analysis [9, 10, 11]. Simple graph gram- they are better suited to rich graph data such as user-system
mars are, like string grammars, composed of a set of pro- interaction logs and student-produced argument diagrams.
duction rules that map from one structure to another. In
this case the rules map from a simple subgraph, typically a Augmented Graph Grammars have previously been used for
single node or arc, to a more complex structure. As with the detection of empirically-valid substructures in student-
string grammars the node and arc types are drawn from produced argument diagrams [13, 14]. In that work a-priori
finite alphabets. Despite their utility, however, graph gram- rules were used to represent key discussion features and ar-
mars are very difficult to construct. The data structures gumentative flaws. Argument diagrams are graphical ar-
required are complex [5]. Moreover, development of suitable gument representations that reify key features of arguments
graph grammars generally requires considerable domain ex- such as hypothesis statements, claims, and citations as nodes
pertise. Most existing uses of graph grammars have relied and the supporting, opposing, and informational relation-
on hand-authored rules. ships as arcs between them.

A sample diagram collected in that work is shown in Figure
1 The diagram includes a central research claim node, which
has a single text field indicating the content of the research
claim. A set of citation nodes are connected to the claim
node via a set of supporting, opposing and undefined arcs
colored with green, red and blue respectively. Each citation
node contains two fields: one for the citation information,
and the other for a summary of the cited work; each arc has
a single text field explaining why the relationship holds. At
Figure 1: A student-produced LASAD argument diagram representing an introductory argument.

the bottom of the diagram, there is a single isolated hypoth- t
esis node that contains two text fields, one for a conditional
or IF field, and the other for conditional or THEN field. We
expect the induced graph grammars from a set of argument (P aredW comp) O S
diagrams can be used to evaluate the student thesis work.
a ¬c b
Figure 2 shows an a-priori rule that was defined as part
of that work. This rule is designed to identify a subgraph 
t.T pye = “claim00 or“hypothesis00 

where a single target node t is connected to two separate 
a.T ype = “Citation00
 
citation nodes a and b such that: a is connected to t via an
 
00
opposing path; b is connected via a supporting path; and  b.T ype = “Citation 
c.T ype = “Comparison00
 
there exists no comparison arc between a and b. The rules
 
in that study were implemented using AGG an augmented
graph grammar library built in Python [12]. AGG matches
Figure 2: A simple augmented graph grammar rule
the graphs using recursive stack-based algorithm. The code
that detects compared counterarguments. The rule
first matches the ground nodes at the top-level of the class
shows a two citation nodes (a, & b) that have oppos-
(t, a, & b). It then tests for the recursive productions O,
ing relationships with a shared claim node (t) and do
and S, before finally testing for the negated comparison arc
not have a comparison arc (c) drawn between them.
c. This rule does not make use of the full range of potential
The arcs S and O represent recursive supporting
capacity for Augmented Graph Grammars. However it is
and opposing path.
illustrative of the type of rules we plan to induce here, rules
that generalize beyond basic types and draw on existing pro-
duction classes but not, at least in the immediate term, use
complex textual elements or functional features.
algorithm [19]. They are based upon controlled graph walks
coupled with indexing. While these algorithms are effective,
3. GRAMMAR INDUCTION particularly on smaller graphs, with low vertex degree they
Graph and relational data has grown increasingly prevalent can also overfit simpler graph structures and they do not
and graph analysis algorithms have been applied in a wide scale well to larger, denser graph data [20].
range of domains from social network analysis [15] to bioin-
formatics [16]. Most of this work falls into one of two cate- The SUBDUE system takes a greedy-compression approach
gories of algorithms: frequent subgraph matching, and graph to graph mining. SUBDUE searches for candidate sub-
compression. graphs that can best compress the input graphs by replacing
a candidate subgraph with a single vertex. Then nodes and
A number of algorithms have been developed to discover fre- arcs are added to the vertices to form new candidate sub-
quent subgraphs. These include the gSpan algorithm devel- graphs. The process is recursive and relies on the Minimum-
oped by Yan and Han [17]; the AGM algorithm developed by Description-Length (MDL) principle to evaluate the candi-
Inokuchi et al [18]; and the FSG algorithm developed by Ku- dates. SUBDUE has been applied successfully to extract
ramochi and Karypis which is based on the previous Apriori structure from visual programming [5], web search [21], and
analyzing user behaviors in games [22]. lution space. Therefore it is well suited to our present needs.
This flexibility is costly, however, as EC is far more compu-
While these methods are successful they have practical and tationally expensive than more specialized algorithms, and
theoretical limitations. Both classes of approaches are lim- applications of EC can require a great deal of tuning for
ited to static graphs composed from a finite alphabet of node each use. In the subsections below we will describe the fit-
and arc types. The frequent subgraph approaches are based ness function and the operators that we will use in this work.
upon iterative graph walks and can be computationally ex- For this work we will rely on pyEC a general purpose evo-
pensive and are limited to finding exact matches. They do lutionary computation engine that we have developed [26].
not generalize beyond the exact graphs matched nor do they
allow for recursive typing. SUBDUE, by contrast is a greedy 4.2 Dataset
algorithm that finds the single most descriptive grammar Our initial analysis will be based upon a corpus of expert
and does not allow for weighted matches. graded student produced argument diagrams and essays pre-
viously described in [13, 14]. That dataset was collected as
For our present purposes, however, our goal is to identify part of a study on students’ use of argument diagrams for
multiple heirarchical classes of the type shown in Figure 2 writing that was conducted at the University of Pittsburgh
that can: generalize beyond exact node and arc types; can in 2011. For that study we selected a set of students in an
draw on recursive rule productions; and can be weighted undergraduate-level course on Psychological Research Meth-
based upon the graph quality. Moreover our long-term goal ods. As part of the course the students were tasked with
with this work is to explore graph rule induction mechanisms planning and executing an empirical research study and then
that can be expanded to include textual rules and complex drafting a written report. The students were permitted to
constraints. For that reason we have elected to apply evo- work individually or in teams. This report was structured as
lutionary computation. This is a general-purpose machine a standard empirical workshop paper. Prior to drafting the
learning mechanism that can be tunes to explore a range of report the students were tasked with diagramming the ar-
possible induction mechanisms. gument that they planned to make using LASAD an online
tool for argument diagramming and annotation.
4. METHODS
Subsequent to this data collection process the diagrams and
4.1 Evolutionary Computation essays were graded by an experienced TA using a set of par-
Evolutionary Computation (EC) is a general class of ma- allel grading rubrics. These rubrics focused on the quality of
chine learning and optimization methods that are inspired the arguments in the diagrams and essays and were used to
by the process of Darwinian evolution through natural se- demonstrate that the structure and quality of the diagrams
lection [23] such as Genetic Algorithms [24] or Genetic Pro- can be used to predict the students’ subsequent essay per-
gramming [25]. EC algorithms begin with a population of formance. These grades will be used as the weighting metric
randomly generated candidate solutions such as snippets of for the diagrams and will be correlated with performance as
random code, strings representing a target function, or for- part of the fitness function we describe below. After comple-
mal rules. Each of these solutions is ranked by a fitness tion of the data collection, grading, and testing phases and
function that is used to evaluate the quality of the individu- accounting for student dropout and incomplete assignments
als. These functions can be defined by absolute measures of we collected 105 graded diagram-essay pairs 74 of which were
success such as a suite of test cases, or by relative measures authored by teams.
such as a competition between chess-playing systems.
4.3 Solution Representation
Once the individuals have been ranked a new generation of For the purposes of our present experiments we will use a
individuals is produced through a combination of crossover restricted solution representation that relies on a subset of
and mutation operations. Crossover operations combine two the augmented graph grammar formalism exemplified by the
or more parents to produce one or more candidate children. rule shown in Figure 2. This will include only element types
In Genetic Algorithms where the candidate solutions are rep- and recursive productions. In future work we plan to sup-
resented as strings this can be accomplished by splitting two port the induction of more complex rules defined by multiple
parents at a given index and then exchanging the substrings graph classes, novel productions, and expressions. However
to produce novel children. In Genetic Programming the par- for the present study we will focus on the simple case of
ents exchange blocks of code, functions, or subtrees. Muta- individual classes coupled with predefined productions.
tion operations alter randomly-selected parts of a candidate
solution by swapping out one symbol or instruction for an- 4.4 Fitness Function
other, adding new sub-solutions, or deleting components. We plan to use the frequency correlation metric previously
This process of ranking and regeneration will iterate until employed in [13, 14]. In that study the authors assessed the
a target performance threshold is reached or a maximum empirical validity of a set of a-priori diagram rules. The
number of generations has passed. validity of each individual rule was assessed by testing the
correlation between the frequency of the class in the existing
EC methods are highly general algorithms that can be read- graph and the graph grade. The strength of that correlation
ily adapted to novel domains by selecting an appropriate was estimated using Spearman’s ρ a non-parametic measure
solution representation and modification operations. Thus, of correlation [27]. In that work the authors demonstrated
in contrast to more specific methods such as SUBDUE, the that the a-priori rule frequency was correlated with stu-
EC algorithm allows us to tune the inductive bias of our dents’ subsequent essay grades and showed that the frequen-
search and to explore alternative ways of traversing the so- cies could be used to predict students’ future performance.
4.5 Mutation
Our mutation operator will draw on the predefined graph
ontology to make atomic changes to an existing graph class. B C D F G
The change will be one of the following operations:
1 ∅ 3 A 7 ∅ E
Change Node change an existing node’s type.

Change Arc Change an existing arc’s type or orientation. 4 5 B ∅ F

Delete Node Delete a node and its associated arcs.
∅ C
Delete Arc Delete an existing arc.

Add Node Add a novel node with a specified type.
Figure 3: Canonical matricies for crossover parents.
Add Arc Add an arc between existing nodes or add with
new nodes.

4.6 Crossover B G D F C
By design the crossover operation should, like genetic crossover,
be conservative. Two very similar parents should produce 1 ∅ 3 E 7 ∅ A
similar offspring. Crossover operations should therefore pre-
serve good building blocks and sub-solutions or introns through
random behavior [25]. Arbitrary graph alignment and crossover ∅ 5 B 4 F
is a challenging problem that risks causing unsustainable
changes on each iteration. We therefore treat graph crossover
as a matrix problem. ∅ G

For each pair of parent classes we will define a pair of di-
agonal matricies of the type illustrated in Figure 3. The Figure 4: Canonical matricies for crossover children.
letter indicies on the top and right indicate nodes while the
numerical indicies internally indicate arcs, and the ∅ symbol
indicates that no arc is present. The matricies are gener- grading and feedback. This work represents an improve-
ated in a canonical order based upon the order in which the ment over prior graph grammar induction algorithms which
nodes were added to the class. Thus on each iteration of the are limited to classical graph grammars and greedy extrac-
crossover process the corresponding elements will obtain the tion. This work also represents an extension for evolution-
same index. As a consequence good subsolutions will obtain ary computation by shifting it into a new domain. As part
the same location and will tend to be preserved over time. of this work we also plan to explore additional extensions to
the standard evolutionary computation algorithm to address
Once a set of parent matricies has been generated we then problems of over-fitting such as χ2 reduction.
generate two child matricies of the same size as the parents
and then randomly select the node and arc members. In the
example shown in figures 3 and 4 the parents have nodes 6. REFERENCES
{A,B,C,D} and {E,F,G} while the children have {E,B,G,D} [1] John L Pfaltz and Azriel Rosenfeld. Web grammars.
and {A,F,C}. Thus we align the nodes in canonical order In Proceedings of the 1st international joint conference
and, for each node pair, we flip a coin to decide where they on Artificial intelligence, pages 609–619. Morgan
are copied. If one parent is larger than the other than any Kaufmann Publishers Inc., 1969.
additional nodes, in this case D, will be copied to the larger [2] John L Pfaltz. Web grammars and picture description.
child. We then perform a comparable exchange process for Computer Graphics and Image Processing,
the arcs. Each arc or potential arc is defined uniquely in the 1(2):193–220, 1972.
matricies by its endpoints. We thus align the lists of arcs [3] Horst Bunke. Attributed programmed graph
in a comparable manner and then decide randomly which grammars and their application to schematic diagram
arc, or empty arc, to copy. As with the nodes, extra arcs interpretation. IEEE Transactions on Pattern
from the larger parent, in this case 3,5, and one ∅ are copied Analysis and Machine Intelligence, 4(6):574–582, 1982.
directly into the larger of the two children. [4] Michihiro Kuramochi and George Karypis. Finding
frequent patterns in a large sparse graph*. Data
5. FUTURE WORK mining and knowledge discovery, 11(3):243–271, 2005.
In this paper we presented a method for the induction of [5] Keven Ates, Jacek Kukluk, Lawrence Holder, Diane
augmented graph grammars through evolutionary computa- Cook, and Kang Zhang. Graph grammar induction on
tion. We are presently applying this work to the automatic structural data for visual programming. In Tools with
induction of empirically-valid rules for student-produced ar- Artificial Intelligence, 2006. ICTAI’06. 18th IEEE
gument diagrams. This work will serve to extend our prior International Conference on, pages 232–242. IEEE,
efforts on the use of augmented graph grammars for student 2006.
[6] Francesc Rosselló and Gabriel Valiente. Graph CSREA Press, 2008.
transformation in molecular biology. In Formal [17] Xifeng Yan and Jiawei Han. gspan: Graph-based
Methods in Software and Systems Modeling, pages substructure pattern mining. In Proceedings of the
116–133. Springer, 2005. IEEE International Conference on Data Mining
[7] Luc Dehaspe, Hannu Toivonen, and Ross D King. (ICDM 2002), pages 721–724. IEEE, 2002.
Finding frequent substructures in chemical [18] Akihiro Inokuchi, Takashi Washio, and Hiroshi
compounds. In KDD, volume 98, page 1998, 1998. Motoda. An apriori-based algorithm for mining
[8] Stefan Kramer, Luc De Raedt, and Christoph Helma. frequent substructures from graph data. In Principles
Molecular feature mining in hiv data. In Proceedings of Data Mining and Knowledge Discovery, pages
of the seventh ACM SIGKDD international conference 13–23. Springer, 2000.
on Knowledge discovery and data mining, pages [19] Michihiro Kuramochi and George Karypis. Frequent
136–143. ACM, 2001. subgraph discovery. In Proceedings IEEE International
[9] Wenke Lee and Salvatore J Stolfo. A framework for Conference on Data Mining. (ICDM 2001), pages
constructing features and models for intrusion 313–320. IEEE, 2001.
detection systems. ACM transactions on Information [20] MICHIHIRO Kuramochi and George Karypis. Finding
and system security (TiSSEC), 3(4):227–261, 2000. topological frequent patterns from graph datasets.
[10] Calvin Ko. Logic induction of valid behavior Mining Graph Data, pages 117–158, 2006.
specifications for intrusion detection. In Proceedings of [21] Nitish Manocha, Diane J Cook, and Lawrence B
the IEEE Symposium on Security and Privacy. (S&P Holder. Cover story: structural web search using a
2000), pages 142–153. IEEE, 2000. graph-based discovery system. Intelligence,
[11] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast 12(1):20–29, 2001.
algorithms for mining association rules. In Proceedings [22] Diane J. Cook, Lawrence B. Holder, and G. Michael
of the 20th International Conference on very large Youngblood. Graph-based analysis of human transfer
data bases, VLDB, volume 1215, pages 487–499, 1994. learning using a game testbed. IEEE Trans. on
[12] Collin F Lynch. Agg: Augmented graph grammars for Knowl. and Data Eng., 19:1465–1478, November 2007.
complex heterogeneous data. In Proceedings of the [23] Charles Darwin. On the Origin of Species by Means of
first international workshop on Graph-Based Natural Selection, or the Preservation of Favoured
Educational Data Mining (GEDM 2014). Races in the Struggle for Life. John Murray:
[13] Collin F. Lynch and Kevin D. Ashley. Empirically Albermarle Street: London, United Kingdom, 6
valid rules for ill-defined domains. In John Stamper edition, 1872.
and Zachary Pardos, editors, Proceedings of The 7th [24] Melanie Mitchell. An Introduction to Genetic
International Conference on Educational Data Mining Algorithms. MIT Press: Cambridge, Massachusetts,
(EDM 2014). International Educational Datamining 1999.
Society IEDMS, 2014. [25] Wolfgang Banzhaf. Genetic programming: an
[14] Collin F. Lynch, Kevin D. Ashley, and Min Chi. Can introduction on the automatic evolution of computer
diagrams predict essay grades? In Stefan programs and its applications. Morgan Kaufmann
Trausan-Matu, Kristy Elizabeth Boyer, Martha E. Publishers ; Heidelburg : Dpunkt-verlag; San
Crosby, and Kitty Panourgia, editors, Intelligent Francisco, California, 1998.
Tutoring Systems, Lecture Notes in Computer Science, [26] Collin F. Lynch, Kevin D. Ashley, Niels Pinkwart, and
pages 260–265. Springer, 2014. Vincent Aleven. Argument graph classification with
[15] Sherry E. Marcus, Melanie Moy, and Thayne Coffman. genetic programming and c4.5. In Ryan
Social network analysis. In Diane J. Cook and Shaun Joazeiro de Baker, Tiffany Barnes, and
Lawrence B. Holder, editors, Mining Graph Data, Joseph E. Beck, editors, The 1st International
chapter 17, pages 443–468. John Wiley & Sons, 2006. Conference on Educational Data Mining, Montreal,
[16] Chang Hun You, Lawrence B. Holder, and Diane J. Québec, Canada, June 20-21, 2008. Proceedings, pages
Cook. Dynamic graph-based relational learning of 137–146, 2008.
temporal patterns in biological networks changing over [27] Wikipedia. Spearman’s rank correlation coefficient —
time. In Hamid R. Arabnia, Mary Qu Yang, and wikipedia, the free encyclopedia, 2013. [Online;
Jack Y. Yang, editors, BIOCOMP, pages 984–990. accessed 27-February-2013].
Communities of Performance
& Communities of Preference

Rebecca Brown Collin Lynch Yuan Wang
North Carolina State North Carolina State Teachers College, Columbia
University University University
Raleigh, NC Raleigh, NC New York, NY
rabrown7@ncsu.edu cflynch@ncsu.edu elle.wang@columbia.edu
Michael Eagle Jennifer Albert Tiffany Barnes
North Carolina State University North Carolina State North Carolina State
Raleigh, NC University University
mjeagle@ncsu.edu Raleigh, NC Raleigh, NC
jennifer_albert@ncsu.edu tmbarnes@ncsu.edu
Ryan Baker Yoav Bergner Danielle McNamara
Teachers College, Columbia Educational Testing Service Arizona State University
University Princeton, NJ Phoenix, AZ
New York, NY ybergner@gmail.com dsmcnamara1@gmail.com
ryanshaunbaker@gmail.com

ABSTRACT the performance of weaker ones. It has not yet been shown,
The current generation of Massive Open Online Courses (MOOCs) however, that this type of support occurs in practice.
operate under the assumption that good students will help poor
students, thus alleviating the burden on instructors and Teaching Prior research on social networks has shown that social groups,
Assistants (TAs) of having thousands of students to teach. In even those that gather face-to-face, can fragment into disjoint
practice, this may not be the case. In this paper, we examine so- sub-communities [37]. This small-group separation, if it takes
cial network graphs drawn from forum interactions in a MOOC place in an online course, can be considered negative or positive,
to identify natural student communities and characterize them depending on one’s perspective. If poor students communi-
based on student performance and stated preferences. We exam- cate only with similarly-floundering peers, then they run the
ine the community structure of the entire course, students only, risk of perpetuating misunderstandings and of missing insights
and students minus low performers and hubs. The presence of discussed by better-performing peers and teaching staff. An
these communities and the fact that they are homogeneous with instructor may wish to avoid this fragmentation to encourage
respect to grade but not motivations has important implications poor students to connect with better ones.
for planning in MOOCs.
These enduring subgroups may be beneficial, however, by help-
Keywords ing students to form enduring supportive relationships. Research
MOOC, social network, online forum, community detection by Li et al. has shown that such enduring relationships can
enhance students’ social commitment to a course [18]. We be-
lieve that this social commitment will in turn help to reduce
1. INTRODUCTION feelings of isolation and alienation among students in a course.
The current generation of Massive Open Online Courses (MOOCs) Eckles and Stradley [9] have shown that such isolation is a key
is designed to leverage student interactions to augment instruc- predictor of student dropout.
tor guidance. The activity in courses on sites such as Coursera
and edX is centered around user forums that, while curated We have previously shown that students can form stable com-
and updated by instructors and TAs, are primarily constructed munities and that those communities are homogeneous with
by students. When planning and building these courses, it is respect to performance [3]. However that work did not: show
hoped that students will help one another through the course whether these results are consistent with prior work on imme-
and that interacting with stronger students will help to improve diate peer relationships; address the impact of hub students on
these results; or discuss whether students’ varying goals and
preferences motivate the community structure. Our goal in this
paper is to build upon our prior work by addressing these issues.
In the remainder of this paper we will survey prior educational
literature on community formation in traditional and online
classrooms. We will then build upon our prior work by exam-
ining the impact of hub users. And we will look at the impact
of user motivations on community formation.
2. RELATED WORK 2.2 Communities, Hubs, & Peers
Kovanovic et al. [15] examined the relationship between social
2.1 MOOCs, Forums, & Student Performance network position or centrality, and social capital formation in
A survey of the literature on MOOCs shows the beginnings of a courses. Their work is specifically informed by the Community
research base generating an abundance of data that has not yet of Inquiry (COI) framework. the COI framework is focused on
been completely analyzed [19]. According to Seaton et al. [29], distance education and is particularly suited to online courses of
most of the time students spend on a MOOC is spent in dis- the type that we study here. The model views course behavior
cussion forums, making them a rich and important data source. through three presences which mediate performance: cognitive,
Stahl et al. [30] illustrates how through this online interaction teaching, and social.
students collaborate to create knowledge. Thus students’ forum
activity is good not only for the individual student posting con- This social presence considers the nature and persistence of
tent or receiving answers, but for the class as a whole. Huang et student interactions and the extent to which they reinforce stu-
al. [14] investigated the behavior of the highest-volume posters dents’ behaviors. In their analysis, the authors sought to test
in 44 MOOC-related forums. These “superposters” tended to whether network relationships, specifically students’ centrality
enroll in more courses and do better in those courses than the in their social graph, is related to their social performance as
average. Their activity also added to the overall volume of forum measured by the nature and type of their interactions. To that
content and they left fewer questions unanswered in the forums. end, they examined a set of course logs taken from a series of
Huang et al. also found that these superposters did not suppress online courses offered within a public university. They found
the activity of less-active users. Rienties et al. [25] examined the that students’ position within their social graph was positively
way in which user interaction in MOOCs is structured. They correlated with the nature and type of their interactions, thus
found that allowing students to self-select collaborators is more indicating that central players also engaged in more useful social
conducive to learning than randomly assigning partners. Further, interactions. They did not extend this work to groups, however,
Van Dijk et al. [31] found that simple peer instruction is signif- focusing solely on individual hub students.
icantly less effective in the absence of a group discussion step,
pointing again to the importance of a class discussion forum. Other authors have also examined the relationship between
network centrality, neighbor relationships, network density, and
More recently Rosé et al. [27] examined students’ evolving inter- student performance factors. Eckles and Stradley [9] applied
actions in MOOCs using a Mixed-Membership Stochastic Block network analysis to student attrition, finding that students with
model which seeks to detect partially overlapping communities. strong social relationships with other students who drop out
They found that the likelihood that students would drop out are significantly more likely to drop out themselves. Rizzuto
of the course is strongly correlated with their community mem- et al. [26] studied the impact of social network density on stu-
bership. Students who actively participated in forums early in dent performance. Network density is defined as the fraction
the course were less likely to drop out later. Furthermore, they of possible edges that are present in a given graph. Thus it
found one forum sub-community that was much more prone is a measure of how “clique-like” the graph is. The authors
to dropout than the rest of the class, suggesting that MOOC examined self-reported social networks for students in a large
communities are made up of students who behave in similar traditional undergraduate psychology course. They found that
ways. This community can in turn reflect or impact a student’s denser social networks were significantly correlated with per-
level of motivation and their overall experience in a course much formance. However, a dominance analysis [1] showed that this
like the “emotional contagion” model used in the Facebook mood factor was less predictive than pure academic ability. These re-
manipulation study by Kramer, Guillroy, and Hancock [16]. sults serve to motivate a focus on the role of social relationships
in student behavior. Their analysis is complicated, however, by
Yang et al. [36] also notes that unlike traditional courses stu- their reliance on self-report data which will skew the strength
dents can join MOOCs at different times and observed that and recency of the reported relationships.
students who join a course early are more likely to be active
and connected in the forums, and less likely to drop out, than Fire et al. [11] studied student interaction in traditional class-
those who join later. MOOCs also attract users with a range of rooms, constructing a social network based on cooperation on
individual motivations. In a standard classroom setting students class assignments. Students were linked based on partnership on
are constrained by availability, convention, and goals. Few stu- group work as well as inferred cooperation based on assignment
dents enroll in a traditional course without seeking to complete submission times and IP addresses. The authors found that a
it and to get formal credit for doing so. MOOCs by virtue of student’s grade was significantly correlated with the grade of
their openness and flexibility attract a wide range of students the student with the strongest links to that student in the social
with unique personal motivations [10]. Some join the course network. We perform similar analysis in this paper to examine
with the intent of completing it. Others may seek only to brush whether the same correlation exists in MOOCs.
up on existing knowledge, obtain specific skills, or just watch
the videos. These distinct motivations in turn lend themselves Online student interaction in blended courses has also been
to different in-class behaviors including assignment viewing and linked to course performance. Dawson [8] extracted student
forum access. The impact of user motivations in online courses and instructor social networks from a blended course’s online
has been previously discussed by Wang et al. [32, 33]; we will discussion forums and found that students in the 90th grade
build upon that work here. Thus it is an open question whether percentile had larger social networks than those in the 10th
these motivations affect students’ community behaviors or not. percentile. The study also found that high-performing students
primarily associated with other high-performing students and
were more likely to be connected to the course instructor, while
low-performing students tended to associate with other low-
performers. In a blended course, this effect may be offset by the same material as a graduate-level course, Core Methods
face-to-face interaction not captured in the online social network, in Educational Data Mining, at Teachers College Columbia
but if the same separation happens in MOOC communities, low- University. The MOOC spanned from October 24, 2013 to
performing students are less likely to have other chances to learn December 26, 2013. The weekly course was composed of lecture
from high-performing ones. videos and 8 weekly assignments. Most of the videos contained
in-video quizzes (that did not count toward the final grade).
2.3 Community Detection
One of the primary activities students engage in on forums All of the weekly assignments were structured as numeric input
is question answering. Zhang et al. [38] conducted a social or multiple-choice questions. The assignments were graded au-
network analysis on an online question-and-answer forum about tomatically. In each assignment, students were asked to conduct
Java programming. Using vertex in-degree and out-degree, they analyses on a data set provided to them and answer questions
were able to identify a relatively small number of active users about it. In order to receive a grade, students had to com-
who answered many questions. This allowed the researchers to plete this assignment within two weeks of its release with up
develop various algorithms for calculating a user’s Java expertise. to three attempts for each assignment, and the best score out
Dedicated question-and-answer forums are more structured than of the three attempts was counted. The course had a total
MOOC forums, with question and answer posts identified, but a enrollment of over 48,000, but a much smaller number actively
similar approach might help identify which students in a MOOC participated. 13,314 students watched at least one video, 1,242
ask or answer the most questions. students watched all the videos, 1,380 students completed at
least one assignment,and 778 made a post or comment in the
Choo et al. [5] studied community detection in Amazon product- weekly discussion sections. Of those with posts, 426 completed
review forums. Based on which users replied to each other most at least one class assignment. 638 students completed the online
often, they found communities of book and movie reviewers who course and received a certificate (meaning that some students
had similar tastes in these products. As in MOOC forums, users could earn a certificate without participating in forums at all).
did not declare any explicit social relationships represented in the
system, but they could still be grouped by implicit connections. In addition to the weekly assignments the students were sent
a survey that was designed to assess their personal motivations
In the context of complex networks, a community structure is a for enrolling in the course. This survey consisted of 3 sets
subgraph which is more densely connected internally than it is to of questions: MOOC-specific motivational items; two PALS
the rest of the network. We chose to apply the Girvan-Newman (Patterns of Adaptive Learning Survey) sub-scales [21], Aca-
edge-betweenness algorithm (GN) [13]. This algorithm takes as demic Efficacy and Mastery-Goal Orientation; and an item
input a weighted graph and a target number of communities. focused on confidence in course completion. It was distributed
It then ranks the edges in the graph by their edge-betweenness to students through the course’s E-mail messaging system to
value and removes the highest ranking edge. To calculate Edge- students who enrolled in the course prior to the official start
betweenness we identify the shortest path p(a,b) between each date. Data on whether participants successfully completed the
pair of nodes a and b in the graph. The edge-betweenness course was downloaded from the same course system after the
of an arc is defined as the number of shortest paths that it course concluded. The survey received 2,792 responses; 38% of
participates in. This is one of the centrality measures explored the participants were female and 62% of the participants were
by Kovanovic et al. above [15]. The algorithm then recalcu- male. All of the respondents were over 18 years of age.
lates the edge-betweenness values and iterates until the desired
number of disjoint community subgraphs has been produced. The MOOC-specific items consisted of 10 questions drawn from
Thus the algorithm operates by iteratively finding and removing previous MOOC research studies (cf. [2, 22]) asking respondents
the highest-value communications channel between communities to rate their reasons for enrollment. These 10 items address
until the graph is fully segmented. For this analysis, we used traits of MOOCs as a novel online learning platform. Specifically,
the iGraph library [7] implementation of G-N within R [24]. these 10 items included questions on both the learning content
and features of MOOCs as a new platform. Two PALS Survey
The strength of a candidate community can be estimated by scales [21] measuring mastery-goal orientation and academic
modularity. The modularity score of a given subgraph is defined efficacy were used to study standard motivational constructs.
as a ratio of its intra-connectedness (edges within the subgraph) PALS scales have been widely used to investigate the relation
to the inter-connectedness with the rest of the graph minus the between a learning environment and a student’s motivation (cf.
fraction of such edges expected if they were distributed at ran- [6, 20, 28]). Altogether ten items with five under each scale
dom [13, 35]. A graph with a high modularity score represents were included. The participants were asked to select a number
a dense sub-community within the graph. from 1 to 5 with 1 meaning least relevant and 5 most relevant.
Respondents were also asked to self-rate their confidence on a
3. DATA SET scale of 1 to 10 as to whether they could complete the course
according to the pace set by the course instructor. All three
This study used data collected from the “Big Data in Education”
groups of items were domain-general.
MOOC hosted on the Coursera platform as one of the inaugural
courses offered by Columbia University [32]. It was created in
response to the increasing interest in the learning sciences and 4. METHODS
educational technology communities in using EDM methods For our analysis, we extracted a social network from the online
with fine-grained log data. The overall goal of this course was forum associated with the course. We assigned a node to each
to enable students to apply each method to answer education student, instructor, or TA in the course who added to it. Nodes
research questions and to drive intervention and improvement in representing students were labeled with their final course grade
educational software and systems. The course covered roughly out of 100 points. The Coursera forums operate as standard
threaded forums. Course participants could start a new thread the network would only share edges with vertices of different
with an initial post, add a post to an existing thread, and add scores. Thus grade assortativity allows us to measure whether
a comment or child element below an existing post. We added individuals are not just connected directly to individuals with
a directed edge from the author of each post or comment to the similar scores but whether they correlate with individuals who
parent post and to all posts or comments that preceded it on are one step removed.
the thread based upon their timestamp. We made a conscious
decision to omit the textual content of the replies with the goal Several commonly studied classes of networks tend to have pat-
of isolating the impact of the structure alone. terns in their assortativity. Social networks tend to have high
assortativity, while biological and technological networks tend
We thus treat each reply or followup in the graph as an implicit to have negative values (dissortativity) [23]. In a homogeneous
social connection and thus a possible relationship. Such implicit course or one where students only form stratified communities
social relationships have been explored in the context of recom- we would expect the assortativity to be very high while in a het-
mender systems to detect strong communities of researchers [5]. erogeneous class with no distinct communities we would expect
This is, by design, a permissive definition that is based upon it to be quite low.
the assumption that individuals generally add to a thread after
viewing the prior content within it and that individual threads 4.2 Community Detection
can be treated as group conversations with each reply being a The process of community detection we employed is briefly de-
conscious statement for everyone who has already spoken. The scribed here [3]. As noted there we elected to ignore the edge
resulting network forms a multigraph with each edge represent- direction when making our graph. Our goal in doing so was to
ing a single implicit social interaction. We removed self loops focus on communities of learners who shared the same threads,
from this graph as they indicate general forum activity but even when they were not directly replying to one-another. We
not any meaningful interaction with another person. We also believe this to be a reasonable assumption given the role of class
removed vertices with a degree of 0, and collapsed the parallel forums as a knowledge-building environment in which students
edges to form a simple weighted graph for analysis. exchange information with the group. Individuals who partic-
ipate in a thread generally review prior posts before submitting
In the analyses below we will focus on isolating student perfor- their contribution and are likely to return to view the followups.
mance and assessing the impact of the faculty and hub students. Homogeneity in this context would mean that students gathered
We will therefore consider four classes of graphs: ALL the com- and communicated primarily with equally-performing peers and
plete graph; Student the graph with the instructor and TAs thus that they did not consistently draw from better-performing
removed; NoHub the graph with the instructor and hub users re- classmates and help lower-performing ones or that the at-will
moved; and Survey which includes only students who completed communities served to homogenize performance, with the stu-
the motivation survey. We will also consider versions of the above dents in a given cluster evening out over time.
graphs without students who obtained a score of 0, and without
the isolated individuals who connect with at most one other While algorithms such as GN are useful for finding clusters they
person. As we will discuss below, a number of students received do not, in and of themselves, determine the right number of
a zero grade in the course. Because this is an at-will course, how- communities. Rather, when given a target number they will seek
ever, we cannot readily determine why these scores were obtained. to identify the best possible set of communities. In some imple-
They may reflect a lack of engagement with the course, differen- mentations the algorithm can be applied to iteratively select the
tial motivations for taking the course, a desire to see the course maximum modularity value over a possible range. Determining
materials without assignments, or genuinely poor performance. the correct number of communities to detect, however, is a
non-trivial task especially in large and densely connected graphs
4.1 Best-Friend Regression & Assortativity where changes to smaller communities will have comparatively
Fire et al. [11] applied a similar social network approach to small effects on the global modularity score. As a consequence
traditional classrooms and found a correlation between a stu- we cannot simply optimize for the best modularity score as we
dent’s most highly connected neighbor (”best friend”) and the would risk missing small but important communities [12].
student’s grade. The links in that graph included cooperation
on assignments as well as partnership on group assignments. Therefore, rather than select the clusterings based solely on
To examine whether the same correlation existed in a massive the highest modularity, we have opted to estimate the correct
online course in which students were less likely to know each number of clusters visually. To that end we plotted a series of
other beforehand and there were no group assignments, we modularity curves over the set of graphs. For each graph G we
calculated each student’s best friend in the same manner and applied the GN algorithm iteratively to produce all clusters in
performed a similar correlation. the range (2,|GN |). For each clustering, we then calculated the
global modularity score. We examined the resulting scores to
The simple best friends analysis gives a straightforward mech- identify a crest where the modularity gain leveled off or began to
anism for correlating individual students. However it is also decrease thus indicating that future subdivisions added no mean-
worthwhile to ask about students who are one-step removed ingful information or created schisms in existing high-quality
from their peers. Therefore we will also calculate the grade communities. This is a necessarily heuristic process that is sim-
assortativity (rG ) of the graphs. Assortativity describes the cor- ilar to the use of Scree plots in Exploratory Factor Analysis [4].
relation of values between vertices and their neighbors [23]. The We define the number identified as the natural cluster number.
assortativity metric r ranges between -1 and 1, and is essentially
the Pearson correlation between vertex and their neighbors [23]. 5. RESULTS AND DISCUSSION
A network with r =1 would have each vertex only sharing edges Before removing self-loops and collapsing the edges, the network
with vertices of the same score. Likewise, if r =−1 vertices in contained 754 nodes and 49,896 edges. The final social network
contained 754 nodes and 17,004 edges. 751 of the participants
were students, with 1 instructor and 2 TAs. One individual was
incorrectly labeled as a student when they were acting as the
Chief Community TA. Since this person’s posts clearly indicated
that he or she was acting in a TA capacity with regard to the
forums, we relabeled him/her as a TA. Of the 751 students 304
obtained a zero grade in the course leaving 447 nonzero students.
215 of the 751 students responded to the motivation survey.

There were a total of 55,179 registered users, so the set of 754
forum participants is a small fraction of the entire course audi-
ence. However, forum users are not necessarily those who will
make an effort or succeed in the course. Forum users did not all
participate in the course, and some students who participated in
the course did not use the forums: 1,381 students in the course
got a grade greater than 0, and 934 of those did not post or
comment on the forums, while 304 of the 751 students who did Figure 1: Modularity for each number of clusters,
participate in the forums received a grade of 0. Clearly students including students with zeros.
who go to the trouble of posting forum content are in some
respect making an effort in the course beyond those who don’t,
but this does not necessarily correspond to course success.

5.1 Best-Friend Regression & Assortativity
We followed Fire et al.’s methodology for identifying Best Friends
in a weighted graph and calculated a simple linear regression
over the pairs. This correlation did not include the instructor or
TAs in the analysis. We calculated the correlation between the
students’ grades to their best friends’ grades in the set using
Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari-
ables were strongly correlated, ρ(748)=0.44, p<0.001. However,
the correlation was also affected by the dense clusters of students
with 0 grades. After removing the 0 grade students we found
an additional moderate correlation, ρ(444)=0.29, p<0.001.

Thus the significant correlation between best-friend grade and
grade holds over the transition from the traditional classroom to
Figure 2: Modularity for each number of clusters,
a MOOC. This suggests that students in a MOOC, excluding the
excluding students with zeros.
many who drop out or do not submit assignments, behave sim-
ilarly to those in a traditional classroom in this respect. These
results are also consistent with our calculations for assortativity. 5.2 Community Structure
There we found a small assortative trend for the grades as shown The modularity curves for the graphs both with and without
in Table 1. These values reflect that a student was frequently zero-score students are shown in Figures 1 and 2. We exam-
communicating with students who in turn communicated with ined these plots to select the natural cluster numbers which are
students at a similar performance level. This in turn supports our shown in Table 2. As the values illustrate the instructor, TAs,
belief that homogeneous communities may be found. As Table and hub students have a disproportionate impact on the graph
1 also illustrates, the zero-score students contribute substan- structure. The largest hub student in our graph connects to
tially to the assortativity correlation as well with the correlation 444 out of 447 students in the network. The graph with all
dropping by as much as a third when they were removed. users had lower modularity and required more clusters than the
graphs with only students or only non-hubs (see Table 2), with

Table 2: Graph sizes and natural number of clusters
for each graph.
Table 1: The grade assortativity for each network. Users Zeros V E Clusters
Users Zeros V E rG
All Yes 754 17004 212
All Yes 754 17004 0.29 All No 447 5678 173
All No 447 5678 0.20 Students Yes 751 15989 184
Students Yes 751 15989 0.32 Students No 447 5678 169
Students No 447 5678 0.20 Non-Hub Yes 716 9441 79
Non-Hub Yes 716 9441 0.37 Non-Hub No 422 3119 52
Non-Hub No 422 3119 0.24 Survey Yes 215 1679 58
Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left)
and without (right) hub-students, with each vertex representing a student and grade represented as color.

the non-hub graph having the highest modularity. This suggests
that non-hub students formed more isolated communities, while Table 3: Grade statistics by community, selected
teaching staff and hubs communicated across these communities to show examples of more and less homogeneous
and connected them. communities.
Members Average Grade Standard Deviation
This largely consistent with the intent of the forums and the 118 21.62 36.58
active role played by the instructor and TAs in monitoring and 41 22.00 32.45
replying to all relevant posts in the forums. It is particularly in- 34 25.41 40.44
teresting how closely the curves for the ALL and Student graphs 31 56.13 47.69
mirror one another. This may indicate that the hub students are 20 49.05 45.64
also those that followed the instructor and TAs closely, thus giv- 16 12.44 31.13
ing them isomorphic relationships, or it may indicate that they 14 88.43 22.47
are more connected than even the instructors and thus came to 12 96.08 6.36
bind the forums together on their own. This impact is further 11 96.45 7.38
illustrated by the cluster plots shown in Figure 3. Here the ab- 4 3.00 6.00
sence of the hub students results in a noticeable thinning of the 4 8.50 9.81
graph which in turn highlights the frequency of communication 4 4.25 8.50
that can be attributed to this, comparatively small, group. 4 96.25 3.50

The difference between the full plots and those with zero values
are also notable as the zero grade students were clearly a major
standard deviation for a small selection of the communities in
factor in community formation. A direct examination of the
the ALL reply network including zero-grades, hub students,
user graph showed that many of the zero students were only
and teaching staff. Several of the communities, particularly
connected to other zero students or were not connected at all.
the larger ones, do show a blend of good and poor students,
This is also highlighted in Figure 3. In both graphs the bulk of
with a high standard deviation. However many if not most of
the zero score students are clustered in a tight network of com-
the communities are more homogeneous with good and poor
munities on the left-hand side. That super-community consists
students sharing a community with similarly-performing peers.
primarily of zero score students communicating with other zero-
These clusters have markedly lower standard deviation.
score students, a structure we have nick-named the ‘deathball.’
An examination of the grade distribution for each of the clusters
5.3 Student Performance & Motivation showed that the scores within each cluster were non-normal.
As the color coding in Figure 3 illustrates, the students did Therefore we opted to apply the Kruskal-Wallis (KW) test to
cluster by performance. Table 3 shows the average grade and assess the correlation between cluster membership and perfor-
We also found that community membership was not a significant
Table 4: Kruskal-Wallis test of student grade by predictor of whether students would complete the motivation
community, for each graph. survey or of students’ motivations. We were surprised by the
Users Zeros Chi-Squared df p-value fact that even when we focused solely on individuals who had
All Yes 349.0273 211 < 0.005 completed the survey, the students did not connect by stated
All No 216.1534 172 < 0.02 goals. This suggests to us that the students are more likely
Students Yes 202.0814 78 < 0.005 coalescing around the pragmatic needs of the class or conceptual
Students No 80.93076 51 < 0.005 challenges rather than on the winding paths that brought them
Non-Hub Yes 309.8525 183 < 0.005 there. One limitation of this work is that by relying on the
Non-Hub No 218.9603 168 < 0.01 forum data we were focused solely on the comparatively small
Survey Yes 99.99840 577 < 0.005 proportion of enrolled students (6%) who actively participated
in the forums. This group is, by definition a smaller set of more
actively-involved participants.
mance. The KW test is a nonparametric rank-based analogue
In addition to addressing our primary questions this study also
to the common Analysis of Variance [17]. Here we tested grade
raised a number of open issues for further exploration. Firstly,
by community number with the community being treated as a
this work focused solely on the final course structure, grades, and
categorical variable. The results of this comparison are shown
motivations. We have not yet addressed whether these commu-
in Table 4. As that illustrates, cluster membership was a sig-
nities are stable over time or how they might change as students
nificant predictor of student performance for all of the graphs
drop in our out. Secondly, while we ruled out motivations as a
with the non-zero graphs having markedly lower p-values than
basis for the community this work we were not able to identify
those with zero students included. These results are consistent
what mechanisms do support the communities. And finally this
with our hypothesis that students would form clusters of equal-
study raises the question of generality and whether or not these
performers and we find that those results hold even when the
results can be applied to MOOCs offered on different topics or
highly-connected instructors, TAs and hub students are included.
whether the results apply to traditional and blended courses.
We performed a similar KW analysis for the questions on the
In subsequent studies we plan to examine both the evolution of
motivation survey and for a binary variable indicating whether
the networks over time as well as additional demographic data
or not the student completed the survey at all. For this analysis
with the goal of assessing both the stability of these networks
we evaluated the clusters on all of the graphs. We found no
and the role of other potential latent factors. We will also
significant relationship between the community structure on
examine other potential clustering mechanisms that control for
any of the graphs and the survey question results or the survey
other user features such as frequency of involvement and thread
completion variable. Thus while the clusters may be driven by
structure. We also plan to examine other similar datasets to
separate factors they are not reflected in the survey content.
determine if these features transition across classes and class
types. We believe that these results may change somewhat once
6. CONCLUSIONS AND FUTURE WORK students can coordinate face to face far more easily than online.
Our goal in this paper was to expand upon our prior community
detection work with the goal of aligning that work with prior
research on peer impacts, notably the work of Fire et al. [11]. 7. ACKNOWLEDGMENTS
We also sought to examine the impact of hub students and This work was supported by NSF grant #1418269: “Modeling
student motivations on our prior results. Social Interaction & Performance in STEM Learning” Yoav
Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes
To that end we performed a novel community clustering analysis Co-PIs.
of student performance data and forum communications taken
from a single well-structured MOOC. As part of this analysis we 8. REFERENCES
described a novel heuristic method for selecting natural numbers [1] R. Azen and D. Budescu. The dominance
of clusters, and replicated the results of prior studies of both analysis approach for comparing predictors in multiple
immediate neighbors and second-order assortativity. regression. Psychological Methods, 8(2):129–48, 2003.
[2] Y. Belanger and J. Thornton.
Consistent with prior work, we found that students’ grades Bioelectricity: A quantitative approach Duke University’s
were significantly correlated with their most closely associated first MOOC. Journal of Learning Analytics, 2013.
peers in the new networks. We also found that this correlation [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes,
extended out to their second-order neighborhood. This is consis- R. Baker, Y. Bergner, and D. McNamara. Good
tent with our prior work showing that students form stable user communities and bad communities: Does membership
communities that are homogeneous by performance. We found affect performance? In C. Romero and M. Pechenizkiy,
that those results were stable even if instructors, hub players, editors, Proceedings of the 8th International
students with 0 scores, and students who did not fill out the sur- Conference on Educational Data Mining, 2015. submitted.
vey were removed from consideration. This suggests that either
[4] R. B. Cattell. The scree test for the number of factors.
the students are forming communities that are homogeneous or
Multivariate Behavioral Research, 1(2):245–276, 1966.
that the effect of those individual and network features on the
communities and on performance is minimal. [5] E. Choo, T. Yu, M. Chi, and
Y. Sun. Revealing and incorporating implicit communities
to improve recommender systems. In M. Babaioff,
V. Conitzer, and D. Easley, editors, ACM Conference
on Economics and Computation, EC ’14, Stanford, structure, student motivation, and academic achievement.
CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014. Annual Review of Psychology, 57:487–503, 2006.
[6] K. Clayton, F. Blumberg, [21] C. Midgley, M. L.
and D. P. Auld. The relationship between motivation Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E.
learning strategies and choice of environment whether Freeman. Manual for the Patterns of Adaptive Learning
traditional or including an online component. British Scales (PALS). University of Michigan, Ann Arbor, 2000.
Journal of Educational Technology, 41(3):349–364, 2010. [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh
[7] G. Csardi and T. Nepusz. 2013 - report #1. Journal of Learning Analytics, 2013.
The igraph software package for complex network [23] M. E. Newman. Assortative Mixing in Networks.
research. InterJournal, Complex Systems:1695, 2006. Physical Review Letters, 89(20):208701, Oct. 2002.
[8] S. Dawson. ’Seeing’ the learning [24] R Core Team. R: A Language
community: An exploration of the development of a and Environment for Statistical Computing. R Foundation
resource for monitoring online student networking. British for Statistical Computing, Vienna, Austria, 2012.
Journal of Educational Technology, 41(5):736–752, 2010. [25] B. Rienties, P. Alcott, and D. Jindal-Snape.
[9] J. Eckles and E. Stradley. A To let students self-select or not: That is the
social network analysis of student retention using archival question for teachers of culturally diverse groups. Journal
data. Social Psychology of Education, 15(2):165–180, 2012. of Studies in International Education, 18(1):64–83, 2014.
[10] A. Fini. The technological [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you
dimension of a massive open online course: The know, it’s who you know: Testing a model of the relative
case of the CCK08 course tools. The International Review importance of social networks to academic performance.
Of Research In Open And Distance Learning, 10(5), 2009. Social Psychology of Education, 12(2):175–189, 2009.
[11] M. Fire, [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick,
G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting P. Goldman, and J. Sherer. Social factors that contribute to
student exam’s scores by analyzing social network data. In attrition in MOOCs. In Proc. of the first ACM conference
Active Media Technology, pages 584–595. Springer, 2012. on Learning@ scale conference, pages 197–198. ACM, 2014.
[12] S. Fortunato and M. Barthélemy. [28] A. M. Ryan and H. Patrick. The classroom
Resolution limit in community detection. Proc. social environment and changes in adolescents’ motivation
of the National Academy of Sciences, 104(1):36–41, 2007. and engagement during middle school. American
[13] M. Girvan and M. E. J. Newman. Community structure Educational Research Journal, 38(2):437–460, 2001.
in social and biological networks. Proc. of the National [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and
Academy of Sciences, 99(12):7821–7826, June 2002. D. Pritchard. Who does what in a massive open online
[14] J. Huang, A. Dasgupta, A. Ghosh, course? Communications of the ACM, 57(4):58–65, 2014.
J. Manning, and M. Sanders. Superposter behavior [30] G. Stahl, T. Koschmann,
in MOOC forums. In Proc. of the first ACM conference and D. Suthers. Computer-supported collaborative
on Learning@ scale conference, pages 117–126. ACM, 2014. learning: An historical perspective. Cambridge
[15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. handbook of the learning sciences, 2006:409–426, 2006.
What is the source of social capital? the association [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen.
between social network position and social presence in Interactive lectures in engineering education. European
communities of inquiry. In S. G. Santos and O. C. Santos, Journal of Engineering Education, 26(1):15–28, 2001.
editors, Proc. of the Workshops held at Educational [32] Y. Wang and R. Baker. Content or platform:
Data Mining 2014, co-located with 7th International Why do students complete MOOCs? MERLOT Journal
Conference on Educational Data Mining (EDM of Online Learning and Teaching, 11(1):191–218, 2015.
2014), London, United Kingdom, July 4-7, 2014., volume [33] Y. Wang, L. Paquette, and R. Baker.
1183 of CEUR Workshop Proc. CEUR-WS.org, 2014. A longitudinal study on learner career advancement
[16] A. D. I. Kramer, J. E. Guillory, in MOOCs. Journal of Learning Analytics, 1(3), 2014.
and J. T. Hancock. Experimental evidence of massive-scale [34] Wikipedia. Spearman’s
emotional contagion through social networks. Proc. of the rank correlation coefficient — Wikipedia, the free
National Academy of Sciences, 111(24):8788–8790, 2014. encyclopedia, 2013. [Online; accessed 27-February-2013].
[17] W. H. Kruskal and W. A. Wallis. Use [35] Wikipedia. Modularity (networks) — Wikipedia, the free
of ranks in one-criterion variance analysis. Journal of the encyclopedia, 2014. [Online; accessed 5-February-2015].
American statistical Association, 47(260):583–621, 1952. [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on,
[18] N. Li, H. Verma, A. Skevi, tune in, drop out: Anticipating student dropouts in massive
G. Zufferey, J. Blom, and P. Dillenbourg. Watching open online courses. In Proc. of the 2013 NIPS Data-Driven
MOOCs together: investigating co-located MOOC Education Workshop, volume 10, page 13, 2013.
study groups. Distance Education, 35(2):217–233, 2014. [37] W. W. Zachary. An information
[19] T. R. Liyanagunawardena, A. A. Adams, and S. A. flow model for conflict and fission in small groups.
Williams. MOOCs: A systematic study of the published Journal of Anthropological Research, 33:452–473, 1977.
literature 2008-2012. The International Review of Research [38] J. Zhang, M. S. Ackerman, and L. Adamic.
in Open and Distributed Learning, 14(3):202–227, 2013. Expertise networks in online communities: structure and
[20] J. L. Meece, algorithms. In Proc. of the 16th international conference
E. M. Anderman, and L. H. Anderman. Classroom goal on World Wide Web, pages 221–230. ACM, 2007.
Using the Hint Factory to
Compare Model-Based Tutoring Systems

Collin Lynch Thomas W. Price
North Carolina State North Carolina State
University University
890 Oval Drive 890 Oval Drive
Raleigh, NC 27695 Raleigh, NC 27695
cflynch@ncsu.edu twprice@ncsu.edu
Min Chi Tiffany Barnes
North Carolina State North Carolina State
University University
890 Oval Drive 890 Oval Drive
Raleigh, NC 27695 Raleigh, NC 27695
mchi@ncsu.edu tmbarnes@ncsu.edu

ABSTRACT Model-based tutoring systems are based upon classical ex-
Model-based tutoring systems are driven by an abstract do- pert systems, which represent relevant domain knowledge
main model and solver that is used for solution validation via static rule bases or sets of constraints [9]. These knowl-
and student guidance. Such models are robust but costly edge bases are generally designed by domain experts or with
to produce and are not always adaptive to specific students’ their active involvement. They are then paired with classical
needs. Data-driven methods such as the Hint Factory are search algorithms or heuristic satisfaction algorithms to au-
comparatively cheaper and can be used to generate indi- tomatically solve domain problems, identify errors in student
vidualized hints without a complete domain model. In this solutions, and to provide pedagogical guidance. The goal of
paper we explore the application of data-driven hint analy- the design process is to produce expert models that give the
sis of the type used in the Hint Factory to existing model- same procedural advice as a human expert. Classical model-
based systems. We present an analysis of two probability based tutors have been quite successful in field trials, with
tutors Andes and Pyrenees. The former allows for flexible systems such as the ACT Programming Tutor helping stu-
problem-solving while the latter scaffolds students’ solution dents achieve almost two standard deviations higher than
path. We argue that the state-space analysis can be used to those receiving conventional instruction [5].
better understand students’ problem-solving strategies and
can be used to highlight the impact of different design de- Data-driven hint generation methods such as those used in
cisions. We also demonstrate the potential for data-driven Hint Factory [17] take a different approach. Rather than us-
hint generation across systems. ing a strong domain model to generate a-priori advice, data-
driven systems examine prior student solution attempts to
identify likely paths and common errors. This prior data
1. INTRODUCTION can then be used to provide guidance by directing students
Developers of model-based tutoring systems draw on do- towards successful paths and away from likely pitfalls. In
main experts to develop ideal models for student guidance. contrast to the expert systems approach, these models are
Studies of such systems have traditionally been focused on primarily guided not by what experts consider to be ideal
their overall impact on students’ performance and not on the but by what students do.
students’ user-system interaction. The Hint Factory, by con-
trast, takes a data-driven approach to extract advice based Model-based systems such as Andes [18] are advantageous
upon students’ problem solving paths. In this paper we will as they can provide appropriate procedural guidance to stu-
apply the Hint Factory analytically to evaluate the impact dents at any point in the process. Such models can also be
of user interface changes and solution constraints between designed to reinforce key meta-cognitive concepts and ex-
two closely-related tutoring systems for probability. plicit solution strategies [4]. They can also scale up rapidly
to include new problems or even new domain concepts which
can be incorporated into the existing system and will be
available to all future users. Rich domain models, however,
are comparatively expensive to construct and require the
long-term involvement of domain experts to design and eval-
uate them.

Data-driven methods for generating feedback, by contrast,
require much lower initial investment and can readily adapt
to individual student behaviors. Systems such as the Hint
Factory are designed to extract solutions from prior student
data, to evaluate the quality of those solutions, and to com-
pile solution-specific hints [17]. While this avoids the need
for a strong domain model, it is limited to the space of so-
lutions explored by prior students. In order to incorporate
new problems or concepts it is necessary to collect additional
data. Additionally, such methods are not generally designed
to incorporate or reinforce higher-level solution strategies.

We believe that both of these approaches have inherent ad-
vantages and are not necessarily mutually exclusive. Our
goal in this paper is to explore what potential data-driven
methods have to inform and augment model-based systems.
We argue that data-driven methods can be used to: (1)
evaluate the differences between closely-related systems; (2)
assess the impact of specific design decisions made in those Figure 1: The Andes user interface showing the
systems for user behaviors; and (3) evaluate the potential ap- problem statement window with workspace on the
plication of data-driven hint generation across systems. To upper left hand side, the variable and equation win-
that end we will survey relevant prior work on model-based dows on the right hand side, and the dialogue win-
and data-driven tutoring. We will describe two closely- dow on the lower left.
related tutoring systems and data collected from them. We
will then present a series of analyses using state-based meth-
ods and discuss the conclusions that we drew from them.
is associated with a pre-compiled solution graph that defines
the set of possible solutions and problem-solving steps. The
2. BACKGROUND system uses a principle-driven automated problem solver to
compile these graphs and to identify the complete solution
2.1 Model-Based Tutoring paths. The solver is designed to implement the Target Vari-
Model-based tutoring systems take a classical expert-systems able Strategy (TVS), a backward-chaining problem solving
approach to tutoring. They are typically based upon a strategy that proceeds from a goal variable (in this case
strong domain model composed of declarative rules and facts the answer to the problem) via principle applications to the
representing domain principles and problem-solving actions given information. The TVS was designed with the help of
coupled with an automatic problem solver. This knowledge domain experts and guides solvers to define basic solution
base is used to structure domain knowledge, define individ- information (e.g. given variables) and then to proceed from
ual problems, evaluate candidate solutions, and to provide the goal variable and use principles to define it in terms of
student guidance. Novices typically interact with the system the given variables.
through problem solving with the system providing solution
validation, automatic feedback, pedagogical guidance, and Students working with Andes use a multi-modal user inter-
additional problem-solving tasks. The Sherlock 2 system, for face to write equations, define variables and engage in other
example, was designed to teach avionics technicians about atomic problem-solving steps. A screenshot of the Andes UI
appropriate diagnostic procedures [11]. The system relies on can be seen in Figure 1. Andes allows students to solve prob-
a domain model that represents the avionics devices being lems flexibly, completing steps in any order so long as they
tested, the behavior of the test equipment, and rules about are valid [20]. A step is considered to be valid if it matches
expert diagnostic methods. Sherlock 2 uses these models one or more entries in the saved solution paths and all nec-
to pose dynamic challenges to problem solvers, to simulate essary prerequisites have been completed. Invalid steps are
responses to their actions, and to provide solution guidance. marked in red, but no other immediate feedback is given.
Andes does not force students to delete or fix incorrect en-
Andes [19, 18, 20] and Pyrenees [4] are closely-related model- tries as they do not affect the solution process. In addi-
driven ITSs in the domains of physics and probability. They tion to validating entries, the Andes system also uses the
were originally developed at the University of Pittsburgh un- precompiled solution graphs to provide procedural guidance
der the Direction of Dr. Kurt VanLehn. Like other model- (next-step-help). When students request help, the system
based systems, they rely on a rule-based domain model and will map their work to the saved solution paths. It will then
automatic problem solvers that treat the domain rules as select the most complete solution and prompt them to work
problem-solving steps. They distinguish between higher- on the next available step.
level domain concepts such as Bayes’ Rule, and atomic steps
such as variable definitions. Principles are defined by a cen- One of the original goals of the Andes system was to de-
tral equation (e.g. p(A|B) = (p(B|A) ∗ p(A))/p(B)) and velop a tutor that operated as an “intelligent worksheet.”
encapsulate a set of atomic problem-solving steps such as The system was designed to give students the freedom to
writing the equation and defining the variables within it. solve problems in any order and to apply their preferred
solution strategy. The system extends this freedom by al-
The systems are designed to function as homework-helpers, lowing invalid steps in an otherwise valid solution and by
with students logging into the system and being assigned or allowing students to make additional correct steps that do
selecting one of a set of predefined problems. Each problem not advance the solution state or are drawn from multiple
state of a student’s partial solution at some point during the
problem solving process, and each edge represents an action
that takes the student from one state to another. A complete
solution is represented as a path from the initial state to a
goal state. Each state in the interaction network is assigned
a weight via a value-iteration algorithm. A new student re-
questing a hint is matched to a previously observed state and
given context-sensitive advice. If, for example, the student is
working on a problem that requires Bayes’ Rule and has al-
ready defined p(A), p(B), and p(B|A) then the Hint Factory
would first prompt them to consider defining p(A|B), then
it would point them to Bayes Rule, before finally showing
them the equation p(A|B) = (p(B|A) ∗ p(A))/p(B).

These hints are incorporated into existing tutoring systems
Figure 2: The Pyrenees user interface showing the in the form of a lookup table that provides state-specific
problem statement at the top, the variable and equa- advice. When a user asks for help the tutor will match their
tion lists on the left, and the tutor interaction win- current state to an index state in the lookup table and will
dow with calculator on the lower right. prompt them to take the action that will lead them to the
highest value neighboring state. If their current state is not
found then the tutor will look for a known prior state or
solution paths. This was motivated in part by a desire to will give up. The Hint Factory has been applied successfully
make the system work in many different educational con- in a number of domains including logic proofs [17], data
texts where instructors have their own preferred methods structures [8], and programming [15, 10, 13]. Researchers
[20]. The designers of Andes also consciously chose only to have also explored other related methods for providing data-
provide advice upon demand when the students would be driven hints. These include alternative state representations
most willing to accept it. For the students however, par- [13], path construction algorithms [16, 14], and example-
ticularly those with poor problem-solving skills, this passive based model-construction [12].
guidance and comparative freedom can be problematic as it
does not force them to adhere to a strategy. The primary goal of the Hint Factory is to leverage prior
data to provide optimal state-specific advice. By calculating
This problem motivated the development of Pyrenees. Pyre- advice on a per-state basis, the system is able to adapt to
nees, like Andes acts as a homework helper and supports stu- students’ specific needs by taking into account both their
dents with on-demand procedural and remediation help. It current state and the paths that they can take to reach the
uses an isomorphic domain model with the same principles, goal. As a consequence the authors of the Hint Factory
basic steps, problems, and solution paths. Unlike Andes, argue that this advice is more likely to be in the students’
however, Pyrenees forces students to applying the target- Zone of Proximal Development and thus more responsive to
variable-strategy during problem solving. It also requires their needs than a less-sensitive algorithm.
them to repair incorrect entries immediately before moving
on. Students are guided through the solution process with 3. METHODS
a menu-driven interface, shown in Figure 2. At each step, In order to investigate the application of data-driven meth-
the system asks students what they want to work on next ods to model-based tutoring systems, we collected data from
and permits them to make any valid step that is consistent two studies conducted with Andes and Pyrenees in the do-
with the TVS. Chi and VanLehn [3] conducted a study of the main of probability. We then transformed these datasets
two systems and found that scaffolding the TVS in Pyrenees into interaction networks, consisting of states linked with
helped to eliminate the gap between high and low learners. actions. We used this representation to perform a variety of
This effect was observed both in the original domain where quantitative and qualitative analyses with the goal of eval-
it was taught (in their case probability) and it transferred to uating the differences between the two systems and the im-
a new domain (physics), where students used Andes alone. pact of the specific design decisions that were made in each.

2.2 Data-Extraction and 3.1 The Andes and Pyrenees Datasets
Data-Driven Tutoring. The Andes dataset was drawn from an experiment con-
One of the longstanding goals of educational data-miners is ducted at the University of Pittsburgh [3]. This study was
to support the development of data-driven tutoring systems. designed to assess the differential impact of instruction in
Such systems use past student data to structure pedagogical Andes and Pyrenees on students’ meta-cognitive and problem-
and domain knowledge, administer conceptual and pedagog- solving skills. Participants in this study were college under-
ical advice, or evaluate student performance and needs. A graduates who were required to have taken high-school level
number of attempts have been made to address these goals. algebra and physics but not to have taken a course in prob-
One of the most successful data-driven systems is the Hint ability or statistics. The participants were volunteers and
Factory [1, 2, 17]. The Hint Factory takes an MDP-based were paid by time not performance.
approach to hint generation. It takes as input a set of prior
student logs for a given problem, represented as a network of Forty-four students completed the entire study. However
interactions [6, 7]. Each vertex in this network represents the for the purposes of the present analysis, we drew on all
66 students who completed at least one problem in Andes- tions, this representation uniquely identifies any equation
Probability. This is consistent with prior uses of the Hint for a given problem. Because we used the same state rep-
Factory which draw from all students including those who resentation for both tutors, we were able to compare states
did not complete the problem. The Pyrenees-Probability directly across tutors.
logs from this study were not used due to problems with the
data format that prevented us from completing our analysis. Additionally, we opted to ignore incorrect entries. Pyrenees
From this dataset we drew 394 problem attempts covering prevents students from applying the principles of probabil-
11 problems. The average number of steps required to solve ity improperly and forces them to correct any mistakes made
the problems was 17.6. For each problem we analyzed be- immediately therefore any errors in the student logs are im-
tween 25 and 72 problem attempts, with an average of 35.8 mediately removed making the paths uninformative. Andes,
attempts per problem. Some attempts were from the same by contrast, gives students free reign when writing equations
student, with at most two successful attempts per student. and making other entries. This freedom resulted in syntac-
Over all problems, 81.7% of the attempts were successful, tic errors and improper rule application errors arising in our
with the remainder being incomplete attempts. dataset. The meaning of these invalid equations is inherently
ambiguous and therefore difficult to incorporate into a state
The Pyrenees dataset was drawn from a study of 137 stu- definition. However such errors are immediately flagged by
dents conducted in the 200-level Discrete Mathematics course the system and may be ignored by the student without con-
in the Department of Computer Science at North Carolina sequence as they do not affect the answer validity therefore
State University. This study used the same probability text- they may be safely ignored as well.
book and pre-training materials as those used in the Andes
study. The students used Pyrenees as part of a homework An edge, or action, in our network represents the correct
assignment, in which they completed 12 problems using the application of a rule or a correct variable definition and leads
tutoring system. One of these problems was not represented to a transition from one state to another. For the present
in the Andes dataset. We therefore excluded it from our dataset and state representation, the possible actions were
analysis, leaving 11 shared problems. the definition or deletion of variables or equations. Each of
these actions was possible in both tutors.
Unlike the Andes students, however, the Pyrenees students
were not always required to solve every problem. In this 4. ANALYSIS
study the system was configured to randomly select some In order to develop a broader understanding of our datasets,
problems or problem steps to present as worked examples we first visualized the interaction network for each problem
rather than as steps to be completed. In order to ensure as a weighted, directed graph. We included attempts from
that the results were equivalent we excluded the problem- both Andes and Pyrenees in the network, and weighted the
level worked examples and any attempt with a step-level edges and verticies by the frequency with which it appeared
worked example from our analysis. As a consequence, each in the logs. We annotated each state and edge with the
problem included a different subset of these students. For weight contributed by each tutor. Two examples of these
each problem we analyzed between 83 and 102 problem at- graphs are given in Figure 3.
tempts, with an average of 90.8 attempts per problem. Some
attempts were from the same student, with at most one suc- Throughout this section, we will use these graphs to address
cessful attempt per student. Over all problems, 83.4% of the points we outlined at the end of Section 1. We begin
the attempts were successful. with a case study from one problem and will explore the
student problem solving strategies using our graph repre-
3.2 State and Action Representations sentation. We will then compare the Andes and Pyrenees
In order to compare the data from both tutors, we repre- Systems with a variety of metrics based on this represen-
sented each problem as an interaction network, a represen- tation. We will relate our observations back to the design
tation used originally in the Hint Factory [7]. In the net- decisions of each system and identify evidence that may sup-
work a vertex, or state, represents the sum total of a stu- port or question these decisions. Finally, we will show how
dents’ current problem solving steps at a given time during the analysis methods associated with data-driven hint gen-
a problem-solving attempt. Because Andes permits flexible eration can be used to validate some of these findings.
step ordering while Pyrenees does not, we chose to represent
the problem solving state st as the set of valid variables and 4.1 Case Study: Problem Ex242
equations defined by the student at time t. The graphical representation of a problem is very helpful for
giving a high-level overview of a problem and performing
A variable is a probabilistic expression, such as P (A ∪ B), qualitative analysis. Problem Ex242, shown in Figure 3,
that the student has identified as important to solving the presents an interesting scenario for a number of reasons. The
problem, for which the probability is known or sought. An problem was the 10th in a series of 12 practice problems, and
equation represents the application of a principle of proba- asked the following:
bility, which relates the values of defined variables, such as
the Complement Theorem, P (A) + P (¬A) = 1. Because
such equations can be written in many algebraically equiv- Events A, B and C are mutually exclusive and
alent ways, we represent each equation as a 2-tuple, con- exhaustive events with p(A) = 0.2 and p(B) =
sisting of the set of variables included in the equation (e.g. 0.3. For an event D, we know p(D|A) = 0.04,
{P (A), P (¬A)}) and the principle being applied (e.g. Com- p(D|B) = 0.03, and p(C|D) = 0.3. Determine
plement Theorem). Because we only represent valid equa- p(B|D).
clude them as they represent a fair proportion of the Andes
data. For instance, 62 of the 126 Andes states for Ex242
were singleton states.

Interestingly, while there were small variations among their
solutions, all of the Andes students choose to apply Bayes’
Rule rather than relying solely on the Conditional Probabil-
ity Theorem as suggested by Pyrenees. This, coupled with
the strong proportion of Pyrenees students who also chose
the Bayes’ Rule solution, indicates that the solution offered
by Pyrenees may be unintuitive for students, especially if
they have recently learned Bayes’ Rule. Again, this can be
interpreted as evidence that Pyrenees’ strong guidance did
have an impact on students’ problem solving strategies, but
it also raises concerns about how reasonable this guidance
will appear to the students. Regardless of one’s interpreta-
tion, an awareness of a trend like this can help inform the
evolution of model-based tutors like Andes and Pyrenees.

4.2 Comparing datasets
Figure 3: A graph representation of problems We now turn to qualitatively comparing the datasets. While
Ex252a, left, and Ex242, right. States and edges are it is not a common practice to directly compare data from
colored on a gradient from blue to yellow, indicating different tutors, we argue that it is appropriate, especially
the number of students who reached that state in the in this context. In longstanding tutoring projects it is com-
Andes and Pyrenees tutors respectively. Rounded mon for developers and researchers to make many substan-
edges indicate that at least one student from both tive changes. The Andes system itself has undergone sub-
tutors is present in a state. A green border indicates stantial interface changes over the course of its development
a solution state, and a pink border indicates that a [20]. These changes can alter student behavior in substan-
state is contained in the pedagogically “ideal” solu- tial ways, and it is important for researchers to consider
tion. Edge thickness corresponds to the natural log how they affect not just learning outcomes but also problem
of the number of problem attempts which included solving strategies, as was investigated by Chi et al. [4].
the given edge.
In many respects the close relationship between Andes and
Pyrenees makes them analogous to different versions of the
The problem is notable in the Pyrenees dataset because it same tutor and the presence of an isomorphic knowledge
was the only one in which the students were split almost base and problem set makes it possible for us to draw mean-
evenly among two solution paths. For most of the problems ingful comparisons between students. In this section we will
in the dataset the vast majority of students followed the inspect how the scaffolding design decisions made when con-
optimal solution path with only a few finding alternatives. structing the tutors affected the problem solving strategies
The ideal solution path, as suggested by Pyrenees’ domain exhibited by the students.
model, employed repeated applications of the Conditional
Probability Theorem: P (A ∩ B) = P (A|B)P (B), which the A visual inspection of the state graphs for each problem
problem was designed to teach. The students had been pre- revealed significant portions of each graph were shared be-
viously exposed to Bayes’ Theorem however, and over half tween the two datasets and portions that were represented
of them chose to apply it instead. This allowed them to in only one of the two. Despite the fact that students using
circumvent one variable definition and two applications of Andes were capable of reaching any of the states available
the Conditional Probability Theorem, achieving a slightly to students in Pyrenees, many Pyrenees states were never
shorter solution path. We make no argument here which discovered by Andes students. As noted in Section 4.1, this
path the tutor should encourage students to take, but it is suggests that guidance from the Pyrenees tutor is successful
worth noting that the Hint Factory, when trained on the in leading students down solution paths that they would not
Pyrenees data for this problem, recommends the shorter, otherwise have discovered, possibly applying skills that they
more popular path. would otherwise not have used.

The Andes dataset gives us a very different set of insights To quantify these findings, we calculated the relative sim-
into this problem. Because Andes lacks the strong pro- ilarity of students in each tutor. For a given problem, we
cess scaffolding of Pyrenees, students were able to make a defined the state-similarity between datasets A and B as the
wider variety of choices, leading to a graph with many more, probability that a randomly selected state from a student in
less populous states. While almost every state and edge in A will be passed through by a randomly selected student
the Pyrenees graph represents multiple students, the An- in B. Recall from Section 3.2 that our state representation
des graph contains a number of paths, including solutions, allows us to directly compare states across tutors. By this
that were reached by only one student. In some state-based definition, the self-similarity of a dataset is a measure of
analyses the authors choose to omit these singleton states, how closely its students overlap each other while the cross-
for instance when generating hints. We have chosen to in- similarity is a measure of how closely its students overlap
States Andes Pyrenees Solution States Andes Pyrenees Solution
Andes 0.551 (0.134) 0.494 (0.153) 0.478 (0.186) Andes 0.419 (0.150) 0.327 (0.139) 0.253 (0.172)
Pyrenees 0.460 (0.141) 0.688 (0.106) 0.669 (0.146) Pyrenees 0.372 (0.193) 0.709 (0.145) 0.601 (0.214)
Actions Andes Pyrenees Solution Actions Andes Pyrenees Solution
Andes 0.878 (0.085) 0.874 (0.118) 0.851 (0.140) Andes 0.818 (0.151) 0.582 (0.168) 0.636 (0.205)
Pyrenees 0.828 (0.117) 0.936 (0.021) 0.923 (0.036) Pyrenees 0.678 (0.212) 0.899 (0.038) 0.879 (0.055)

Table 1: Pairwise similarity across tutors and the Table 2: Pairwise similarity across tutors and the
ideal solution path. Similarities were calculated for ideal solution path calculated using a variable-free
each problem, and each cell lists the mean (and stan- state representation. Rows and columns are the
dard deviation) over all problems. The top half cov- same as in Table 1.
ers the state similarity metrics while the bottom half
of each table covers action similarity. Problem Andes Pyrenees
ex132 26 (47.5%) 2 (98.7%)
ex132a 13 (33.3%) 1 (100.0%)
with the other dataset. Similarity measures for the datasets ex144 2 (96.6%) 1 (100.0%)
can be found at the top of Table 1. ex152 11 (0.0%) 4 (0.0%)
ex152a 8 (59.0%) 3 (97.4%)
Predictably, both datasets have higher self-similarity than ex152b 12 (0.0%) 1 (100.0%)
cross-similarity, with Pyrenees showing higher self-similarity ex212 8 (71.4%) 1 (100.0%)
than Andes. This indicates that Pyrenees students chose ex242 9 (0.0%) 2 (49.38%)
more homogeneous paths to the goal. This is reasonable ex252 7 (76.9%) 2 (98.4%)
and consistent with the heavy scaffolding that is built into ex252a 4 (81.8%) 2 (98.6%)
the system. It is important to note that our similarity met- exc137 19 (0.0%) 2 (98.75%)
rics are not symmetric. The cross-similarity of Pyrenees
with Andes is higher than the reverse. This indicates that Table 3: For each problem, the tables gives the num-
the path taken by Pyrenees students are more likely to have ber of unique solution states represented in each tu-
been observed by Andes students than vice-versa. This has tor’s dataset. Note that there may exist many solu-
important implications for designers who are interested in tion paths which reach a given solution. The follow-
collecting data from a system that is undergoing modifica- ing percent (in parentheses) represents the percent
tions. If a system becomes increasingly scaffolded and re- solution paths that ended in the pedagogically ideal
strictive over time, past data will remain more relevant than solution.
in a system that is relaxed. In many ways this simply re-
flects the intuition that allowing students to explore a state
space more fully will produce more broadly useful data, and with the design goals of Pyrenees which was set up to guide
restricting students will produce data that is more narrowly students along the otherwise unfamiliar path of the TVS.
useful. Note that here we are only observing trends, and we
make no claims of statistical significance. We also opted to examine the impact of the variable defini-
tions on our evaluation. As noted above, variable definitions
In our analysis, we found that, within both datasets, many are an atomic action. They do not depend upon any event
solution paths or sub-paths differed only in the order that assertion and thus have no ordering constraints unlike the
actions were performed. In our domain, many actions do principles. We did so with the hypothesis that this would
not have ordering constraints. It is possible, for example, increase the similarity metrics for the datasets by eliminat-
to define the variables A and B in either order, and the re- ing the least constrained decisions from consideration. Our
sulting solution paths would deviate from one another. We results are shown in Table 2 below. Contrary to our ex-
thus sought to determine how much of the observed differ- pectations, this actually reduced the similarity both within
ence between our two datasets was due to these ordering and across the datasets, with the exception of Pyrenees’ self-
effects. To that end we define the action-similarity between similarity. Thus the unconstrained variable definitions did
datasets A and P as the probability that a randomly se- not substantially contribute to the dissimilarity. Rather,
lected action performed by a student in A will have been most of the variation lay in the order of principle applica-
performed by a by a randomly selected student in B. These tions.
values are shown in the bottom of Table 1, and each of the
trends observed for state-similarity hold, with predictably 4.3 Similarity to an “ideal" solution
higher similarity values. We now turn to exploring how well the ideal solution was
represented in the datasets. For both tutors the ideal solu-
It is notable that the similarity between Pyrenees and Andes tion is the pedagogically-desirable path constructed via the
is almost as high as Andes’ self-similarity, indicating that the TVS. Our measure of cross-similarity between two datasets
actions taken by Pyrenees students are almost as likely to can also be applied between a single solution path and a
be observed in Andes students as Pyrenees students. This dataset by treating the single solution as a set of one. We
suggests that, for the most part, the Andes students per- can thus measure the average likelihood of an ideal solution
formed a superset of the actions performed by the Pyrenees state appearing in a student’s solution from each dataset.
students. Thus the impact of Pyrenees is most visible in the The results of this calculation are shown in tables 1 and 2,
order of execution, not the actions chosen. This is consistent using both the state- and action-similarities explained ear-
lier. Predictably, the solution has a high similarity with
Pyrenees students, as these students are scaffolded tightly
and offered few chances to deviate from the path.

As Table 3 shows, Pyrenees students were funneled almost
exclusively to the ideal solution on the majority of prob-
lems, even if their paths to the solution were variable. We
found only one problem, Ex152, where the Pyrenees stu-
dents missed the ideal path. That was traced to a pro-
gramming error that forced students along a similar path.
Otherwise, there was only one problem, Ex242 (discussed
in Section 4.1), where a meaningful percentage of students
chose a different solution. The Andes students, by contrast,
were much less likely to finish in the ideal solution state, but
this was also problem-dependent.

4.4 Applications of the Hint Factory
Finally, having shown that the datasets differ, and that these
Figure 4: The four Cold Start curves, averaged
differences are consistent with the differing design choices of
across all problems. The x-axis shows the number of
the two tutors, we sought to determine what effect those
students used to train the model, and the y-access
differences would have on data-driven hint generation. Our
shows the percentage of a new student’s path that
goal was to determine how applicable a hint model of the
has available hints. The curve labeled “XvY” indi-
type produced by the Hint Factory would be for one dataset
cates training on the X dataset and selecting a new
if it was trained on another. To that end we performed a
student from the Y dataset (A = Andes; P = Pyre-
modified version of the Cold Start Experiment [1], which
nees).
is designed to measure the number of state-specific hints
that Hint Factory can provide given a randomly selected
dataset. The Cold Start experiment functions like leave-
one-out cross-validation for state-based hint generation. In system in the same domain. Clearly a substantive interface
the original Cold Start experiment, one student was selected and scaffolding change of the type made in Pyrenees can
at random and removed from the dataset, to represent a change the state space sufficiently that we cannot trivially
“new” student using the tutor. Each remaining student in rely on our prior data. On the other hand, while the cross-
the dataset was then added, one at a time, in a random application of data does have upper limits, those limits are
order to the Hint Factory’s model. On each iteration, the comparatively high. Clearly data from a prior system can be
model is updated and the percentage of states on the ‘new’ reused and can serve as a reliable baseline for novel system,
student’s path for which a hint is available is calculated. with the caveat that additional exploratory data is required.
This is repeated a desired number of times with new students
to account for ordering effects.
5. DISCUSSION AND CONCLUSION
For the present study we calculated cold-start curves for Our goal in this paper was to evaluate the application of
both the Pyrenees and Andes datasets. We also calculated data-driven methods such as the Hint Factory to model-
curves using the opposing dataset to illustrate the growth based tutoring systems. To that end we analyzed and com-
rate for cross-tutor hints. For these modified curves we se- pared datasets collected from two closely-related tutoring
lected the hint-generating students from the opposing dataset. systems: Andes and Pyrenees. Through our analysis we
All four curves are shown in Figure 4. Here AvA and PvP sought to: (1) evaluate the differences between closely-related
designate the within tutor curves for Andes and Pyrenees systems; (2) assess the impact of specific design decisions
respectively while PvA and AvP designate the cross-tutor made in those systems for user behaviors; and (3) evalu-
curves for hints from Pyrenees provided to Andes users and ate the potential application of data-driven hint generation
vice-versa. Figure 4 represents an average over all problems, across systems.
and therefore the x-axis extends only as far as the minimum
number of students to complete a problem. As the curves We found that, while the systems shared isomorphic domain
illustrate, the within-tutor curves reach high rates of cov- models, problems, and ideal solutions, the observed user be-
erage relatively quickly with PvP reaching a plateau above haviors differed substantially. Students using the Andes sys-
95% after 21 students and AvA reaching 85%. tem explored the space more widely, were more prone to
identify novel solutions, and rarely followed the ideal solu-
The cross-tutor curves, by contrast, reach much lower lim- tion path. Students in Pyrenees, by contrast, were far more
its. AvP reaches a plateau of over 75% coverage, while PvA homogeneous in their solution process and were limited in
reaches a plateau of 60% coverage. This reflects the same the alternative routes they explored. Contrary to our expec-
trends observed in Tables 1 and 2, where Andes better ex- tations, we found that this variation was not due to simple
plains the Pyrenees data than vice-versa; however, neither ordering variations in the simplest of steps but of alterna-
dataset completely covers the other. On the one hand this tive strategy selection for the higher-level domain principles.
result is somewhat problematic as it indicates that prior data This is largely consistent with the design decisions that mo-
has a limited threshold for novel tutors or novel versions of a tivated both systems and with the results of prior studies.
We also found that the state-based hint generation method Building Expert Systems. Addison-Wesley Publishing
used in the Hint Factory can be applied to the Andes and Company Inc., Reading, Massachusetts, U.S.A., 1983.
Pyrenees data given a suitable state representation. For this [10] A. Hicks, B. Peddycord III, and T. Barnes. Building
analysis we opted for a set-based representation given the Games to Learn from Their Players: Generating Hints
absence of strong ordering constraints across the principles. in a Serious Game. In Intelligent Tutoring Systems
We then completed a cold-start analysis to show that the (ITS), pages 312–317, 2014.
cross-tutor data could be used to bootstrap the construction [11] S. Katz, A. Lesgold, E. Hughes, D. Peters, G. Eggan,
of hints for a novel system but does not provide for complete M. Gordin, and L. Greenberg. Sherlock 2: An
coverage. intelligent tutoring system built on the lrdc
framework. In C. P. Bloom and R. B. Loftin, editors,
Ultimately we believe that the techniques used for data- Facilitating the Development and Use of Interactive
driven hint generation have direct application to model- Learning Environments, Computers, Cognition, and
based systems. Data-driven analysis can be used to iden- Work, chapter 10, pages 227 – 258. Lawrence Erlbaum
tify the behavioral differences between closely related sys- Associates, Mawah New Jersey, 1998.
tems and, we would argue, changes from one version of a [12] R. Kumar, M. E. Roy, B. Roberts, and J. I. Makhoul.
system to another. We also found that these changes can Toward automatically building tutor models using
be connected to the specific design decisions made during multiple behavior demonstrations. In
development. Further, we found that data-driven methods S. Trausan-Matu, K. E. Boyer, M. Crosby, and
can be applied to model-based tutoring data to generate K. Panourgia, editors, Proceedings of the 12th
state-based hints. We believe that hint information of this International Conference on Intelligent Tutoring
type may be used to supplement the existing domain models Systems, LNCS 8474, pages 535 – 544. Springer
to produce more user-adaptive systems. In future work we Verlag, 2014.
plan to apply these analyses to other appropriate datasets [13] B. Peddycord III, A. Hicks, and T. Barnes.
and to test the incorporation of state-driven hints or hint Generating Hints for Programming Problems Using
refinement to existing domain models. Intermediate Output. In Proceedings of the 7th
International Conference on Educational Data Mining
6. ACKNOWLEDGMENTS (EDM 2014), pages 92–98, 2014.
Work supported by NSF Grant #1432156 “Educational Data [14] C. Piech, M. Sahami, J. Huang, and L. Guibas.
Mining for Individualized Instruction in STEM Learning En- Autonomously Generating Hints by Inferring Problem
vironments” Min Chi & Tiffany Barnes Co-PIs. Solving Policies. In Learning at Scale (LAS), 2015.
[15] K. Rivers and K. Koedinger. Automatic generation of
7. REFERENCES programming feedback: A data-driven approach. In
[1] T. Barnes and J. Stamper. Toward Automatic Hint The First Workshop on AI-supported Education for
Generation for Logic Proof Tutoring Using Historical Computer Science (AIEDCS 2013), 2013.
Student Data. In Intelligent Tutoring Systems (ITS),
[16] K. Rivers and K. Koedinger. Automating Hint
pages 373–382, 2008. Generation with Solution Space Path Construction. In
[2] T. Barnes and J. C. Stamper. Automatic hint Intelligent Tutoring Systems (ITS), 2014.
generation for logic proof tutoring using historical [17] J. C. Stamper, M. Eagle, T. Barnes, and M. J. Croy.
data. Educational Technology & Society, 13(1):3–12, Experimental evaluation of automatic hint generation
2010.
for a logic tutor. I. J. Artificial Intelligence in
[3] M. Chi and K. VanLehn. Eliminating the gap between Education, 22(1-2):3–17, 2013.
the high and low students through meta-cognitive
[18] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
strategy instruction. In Intelligent Tutoring Systems R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
(ITS), volume 5091, pages 603–613, 2008. M. Wintersgill. The andes physics tutoring system:
[4] M. Chi and K. VanLehn. Meta-Cognitive Strategy Five years of evaluations. In C. Looi, G. I. McCalla,
Instruction in Intelligent Tutoring Systems: How, B. Bredeweg, and J. Breuker, editors, Artificial
When, and Why. Educational Technology & Society, Intelligence in Education - Supporting Learning
3(1):25–39, 2010. through Intelligent and Socially Informed Technology,
[5] A. Corbett. Cognitive computer tutors: Solving the Proceedings of the 12th International Conference on
two-sigma problem. In User Modeling 2001, pages Artificial Intelligence in Education, AIED 2005, July
137–147, 2001. 18-22, 2005, Amsterdam, The Netherlands, volume
[6] M. Eagle and T. Barnes. Exploring Differences in 125 of Frontiers in Artificial Intelligence and
Problem Solving with Data-Driven Approach Maps. In Applications, pages 678–685. IOS Press, 2005.
Educational Data Mining (EDM), 2014. [19] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
[7] M. Eagle and T. Barnes. Exploring Networks of R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
Problem-Solving Interactions. In Learning Analytics M. Wintersgill. The andes physics tutoring system:
(LAK), 2015. Lessons learned. I. J. Artificial Intelligence in
[8] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn Education, 15(3):147–204, 2005.
from you, you learn from me: How to make iList learn [20] K. VanLehn and B. van de Sande. The Andes physics
from students. In Artificial Intelligence in Education tutoring system: An experiment in freedom. Advances
(AIED), 2009. in intelligent tutoring systems., pages 421–443, 2010.
[9] F. Hayes-Roth, D. A. Waterman, and D. B. Lenat.
An Exploration of Data-Driven Hint Generation in an
Open-Ended Programming Problem

Thomas W. Price Tiffany Barnes
North Carolina State University North Carolina State University
890 Oval Drive 890 Oval Drive
Raleigh, NC 27606 Raleigh, NC 27606
twprice@ncsu.edu tmbarnes@ncsu.edu

ABSTRACT lems have a large, often infinite, space of possible states.
Data-driven systems can provide automated feedback in the Even a relatively simple programming problem may have
form of hints to students working in problem solving environ- many unique goal states, each with multiple possible solu-
ments. Programming problems present a unique challenge tion paths, leaving little overlap among student solutions.
to these systems, in part because of the many ways in which Despite this challenge, a number of attempts have been
a single program can be written. This paper reviews current made to adapt the Hint Factory to programming problems [9,
strategies for generating data-driven hints for programming 12, 16]. While these approaches have been generally success-
problems and examines their applicability to larger, more ful, they are most effective on small, well structured pro-
open-ended problems, with multiple, loosely ordered goals. gramming problems, where the state space cannot grow too
We use this analysis to suggest directions for future work to large. This paper explores the opposite type of problem: one
generate hints for these problems. that has a large state space, multiple loosely ordered goals,
unstructured output and involves creative design. Each of
1. INTRODUCTION these attributes poses a challenge to current data-driven hint
Adaptive feedback is one of the hallmarks of an Intelli- generation techniques, but they are also the attributes that
gent Tutoring System. This feedback often takes the form make such problems interesting, useful and realistic. In this
of hints, pointing a student to the next step in solving a paper, we will refer to these as open-ended programming
problem. While hints can be authored by experts or gener- problems. We investigate the applicability of current tech-
ated by a solver, more recent data-driven approaches have niques to these problems and suggest areas for future re-
shown that this feedback can be automatically generated search.
from previous students’ solutions to a problem. The Hint
Factory [18] has successfully generated data-driven hints in a The primary contributions of this paper are 1) a review of
number of problem solving domains, including logic proofs [2], current data-driven hint generation methods for program-
linked list problems [5] and a programming game [12]. The ming problems, 2) an analysis of those methods’ applica-
Hint Factory operates on a representation of a problem called bility to an open-ended problem and 3) a discussion of the
an interaction network [4], a directed graph where each ver- challenges that need to be addressed before we can expect
tex represents a student’s state at some point in the problem to generate hints for similar problems.
solving process, and each edge represents a student’s action
that alters that state. A solution is represented as a path 2. CURRENT APPROACHES
from the initial state to a goal state. A student request- Current approaches to generating data-driven programming
ing a hint is matched to a previously observed state and hints, including alternatives to the Hint Factory [10, 13, 17],
directed on a path to a goal state. The Hint Factory takes can be broken down into three primary components:
advantage of the intuition that students with the same ini-
tial state and objective will follow similar paths, producing a
well-connected interaction network. When this occurs, even 1. A representation of a student’s state and a method
a relatively small sample of student solutions can be enough for determining when one state can be reached from
to provide hints to the majority of new students [1]. another, meaning they are connected in the network
2. An algorithm that, given a student’s current state, con-
While problems in many domains result in well-connected structs an optimal path to a goal state. Often we sim-
networks, this is not always the case. Programming prob- plify this to the problem of picking the first state on
that path
3. A method to present this path, or next state, to the
student in the form of a hint

For the purposes of this paper, we limit our discussion to
the first step in this process. (For a good discussion of the
second step, see an analysis by Piech et al. [13], compar-
ing path selection algorithms.) While in some domains this
first step is straightforward, in programming tasks, espe- inition. Hicks and Peddycord [7, 12] used the Hint Factory
cially open-ended problems, it is likely the most challeng- to generate hints for a programming game called Bots, in
ing. The simplest approach is to take periodic snapshots of which the player writes a program to direct a robot through
a student’s code and treat these as states, connecting con- various tasks in a 3D level. They chose to represent the state
secutive snapshots in the network. However, because two of a player’s program as the final state of the game world
students’ programs are unlikely to match exactly, this ap- after the program was executed. They compared the avail-
proach is likely to produce a very sparse, poorly connected ability of hints when using this “world state” model with a
network, making it difficult to match new students to prior traditional “code state” model, and found that using world
solution attempts. A variety of techniques have been pre- states significantly reduced the total number of states and
sented to address this problem, which can be grouped into increased the availability of hints. The challenge with this
three main strategies: canonicalization, connecting states approach, as noted by the authors, is the generation of ac-
and alternative state definitions. tionable hints. A student may be more capable of making a
specific change to her code than determining how to effect
2.1 Canonicalization a specific change in the code’s output.
Canonicalization is the process of putting code states into
a standardized form, often by removing semantically unim- 3. AN OPEN-ENDED PROBLEM
portant information, so that trivial difference do not prevent The above techniques have all shown success on smaller,
two states from matching. Rivers and Koedinger [15] present well-structured problems, with ample data. We want to in-
a method for canonicalizing student code by first represent- vestigate their applicability to an open-ended problem, as
ing it as an Abstract Syntax Tree (AST). Once in this form, described in Section 1, where this is not the case. The pur-
they apply a number of functions to canonicalize the code, pose of this paper is not to create actionable hints, nor are
including normalizing arithmetic and boolean operators, re- we attempting to show the failures of current methods by
moving unreachable and unused code, and inlining helper applying them to an overly challenging task. Rather, our
functions. After performing this canonicalization on a set purpose is exploratory, using a small dataset to identify ar-
of introductory programming problems, they found that a eas of possible future work, and challenges of which to be
median 70% of states had at least one match in the net- mindful when moving forward with hint generation research.
work. Jin et al. [9] represent a program’s state as a Linkage
Graph, where each vertex is a code statement, and each di- We collected data from a programming activity completed
rected edge represents an ordering dependency, determined by 6th grade students in a STEM outreach program called
by which variables are read and assigned to in each state- SPARCS [3]. The program, which meets for half-day ses-
ment. This state representation allows the Hint Factory to sions approximately once a month during the school year,
ignore statement orderings which are not important to the consists of lessons designed and taught by undergraduate
execution of the program. Lazar and Bratko [10] use the and graduate students to promote technical literacy. The
actual text of Prolog code to represent a student’s state, class consisted of 17 students, 12 male and 5 female.
and then canonicalize the code by removing whitespace and
normalizing variable names. The activity was a programming exercise based on an Hour
of Code activity from the Beauty and Joy of Computing cur-
2.2 Connecting States riculum [6]. It was a tutorial designed to introduce novices
Even with canonicalization, a student requesting a hint may to programming for the first time. The exercise had users
not match any existing state in the network. In this case, we create a simple web-based game, similar to whack-a-mole,
can look for a similar state and create a connection between in which players attempt to click on a sprite as it jumps
them. Rivers and Koedinger [16] use normalized string edit around the screen to win points. The exercise was split into
distance as a similarity metric between two program states. 9 objectives, with tutorial text at each stage. Students were
They connect any two states in the network which have at not required to finish an objective before proceeding. A fin-
least 90% similarity, even if no historical data connect these ished project required the use of various programming con-
states. Additionally, they use a technique called path con- cepts, including events, loops, variables and conditionals.
struction to generate new solution paths from a given state The students used a drag-and-drop, block-based program-
to a nearby, unconnected goal state by searching for a series ming language called Tiled Grace [8], which is similar to
of insertions, deletions and edits to their AST that will trans- Scratch [14]. The user writes a program by arranging code
form it into the goal state [17]. They also use this method blocks, which correspond directly to constructs in the Grace
to discover new goal states which may be closer to the stu- programming language. The editor also supports switching
dent’s current state. Jin et al. [9] use a similar technique to textual coding, but this feature was disabled. A screen-
to transform their Linkage Graphs to better match the cur- shot of the activity can be seen in Figure 1.
rent state of a student when no direct matches can be found
in the interaction network. Piech et al. [13] use path con- During the activity, the students were allowed to go through
struction to interpolate between two consecutive states on the exercise at their own pace. If they had questions, the
a solution path which differ by more than one edit. This students were allowed to ask for help from the student vol-
is useful to smooth data when student code is recorded in unteers. Students were stopped after 45 minutes of work.
shapshots that are too far apart. Snapshots of a student’s code were saved each time it was
run and periodically throughout the session. Occasional
2.3 Alternate State Definitions technical issues did occur in both groups. One student had
Another approach is to forego the traditional code-based severe technical issues, and this student’s data was not an-
representation of a student’s state, and use an alternate def- alyzed (and is not reflected in the counts above). Students
Raw Canonical Ordered
Total States 2380 1781 1656
% Unique 97.5% 94.8% 92.8%
Mean NU Count 3.44 3.95 2.82
Median NU Count 2 2 2
Mean % Path Unique 89.9% 83.0% 78.9%
Standard Deviation (6.67) (10.5) (13.3)

Table 1: Various measures of the sparseness of the
interaction network for the raw, canonicalized, and
ordered-canonicalized state representations. Mean
and median NU counts refer to the number of stu-
dents who reached each non-unique state.

Figure 1: A screenshot of the Hour of Code activ- ent code states to be merged in the process. We therefore
ity that students completed. Students received in- see this as an upper bound on the value of removing unim-
structions on the left panel. The center panel was a portant orderings from a code state. We recomputed our
work area, and students could drag blocks in from metrics for the ordered-canonicalized interaction network as
the bottom-right panel. The top-right panel allowed well. The results can be seen in Table 1. Our later analyses
students to test their games. use the unsorted, canonicalized code representation.

These results indicate that canonicalization does little to re-
produced on average 148.5 unique code states and accom- duce the sparsity of the state space, with students spending
plished between 1 and 6 of the activity’s objectives, averag- most of their time in states that no other student has seen.
ing 3.2 objectives per student. For comparison, recall Rivers and Koedinger found 70% of
states in a simple programming problem had a match after
4. ANALYSIS canonicalization [15], though they were using a much larger
We attempted to understand the applicability of each of the dataset. In our dataset, it is unlikely that we would be able
techniques discussed in Section 2 to our dataset. However, to find a direct path from a new student’s state to a goal
since the output of our program was a game that involved state in order to suggest a hint.
nondeterminism, we felt it would be inappropriate to at-
tempt to represent a program’s state as the result of its 4.2 Connecting States
execution. We therefore focused on the first two strategies,
To address this, we explored the feasibility of connecting a
canonicalization and connecting states.
new student’s state to a similar, existing state in the net-
work. It is unclear how close two code states should be
4.1 Canonicalization before it is appropriate to connect them as in [16], or to
Our initial representation of a student’s code state was a generate a path between them as in [17]. It certainly de-
tree, where each code block was a node, and its children in- pends on the state representation and distance metric used.
cluded any blocks that were nested inside of it. In this way, Rather than identifying a cutoff and measuring how often
our representation was similar to Rivers and Koedinger’s these techniques could be applied, we chose to visualize the
ASTs. To get a baseline for the sparsity of our dataset, distance between two students and make qualitative obser-
we first analyzed the code states without performing any vations. Because our code states were already represented
canonicalization. We calculated the total number of states as trees, we used Tree Edit Distance (TED) as a distance
in the interaction network and the percentage which were metric. While Rivers and Koedinger reported better success
only reached by one student. Of those states reached by with Levenshtein distance [16], we believe that TED is the
multiple students, we calculated the mean and median num- most appropriate distance metric for block code, where tree
ber of students who reached them. We also calculated the edit operations correspond directly to user actions.
percentage of each student’s states that were unreached by
any other student in the dataset. For each pair of students, A and B, we created an N by M
distance matrix, D, where N is the number of states in A’s
We then canonicalized the data by removing variable names solution path, and M is the number of states in B’s solution
and the values of number and string literals. Our problem path. Di,j = d(Ai , Bj ), where d is the TED distance func-
featured very few arithmetic or logical operators, and these tion, Ai is the ith state of A and Bj is the jth state of B.
were generally not nested, so we did not normalize them, We used the RTED algorithm [11] to calculate the distance
as suggested by Rivers and Koedinger [15]. We reran our function, putting a weight of 1.0 on insertions, deletions and
analyses on the canonicalized data. To ensure that we had replacements. We also omitted any state which was identical
effectively removed all unimportant ordering information, to its predecessor state. We normalized these values by di-
we recursively sorted the children of each node in the tree. viding by the maximum value of Di,j , and plotted the result
This effectively removed any ordering information from a as an image. Three such images can be seen in Figure 2.
student’s code state, and kept only hierarchical information.
This is somewhat more extreme than the Linkage Graphs of We also calculated the “path” through this matrix that passes
Jin et al. [9], and it does allow two meaningfully differ- through the least total distance. This does not represent a
Mean Median Max Farthest
1 0.25 (0.27) 0.15 (0.29) 0.76 (0.56) 2.23 (0.75)
2 4.88 (3.93) 4.95 (4.34) 9.18 (5.74) 12.73 (6.10)
4 4.92 (2.77) 4.83 (2.78) 10.11 (3.69) 14.67 (4.77)
5 7.79 (1.32) 7.75 (1.41) 13.17 (1.72) 18.17 (1.72)
6 7.49 (1.11) 7.76 (1.37) 13.17 (0.98) 18.67 (1.75)

Table 2: For each objective, average distances (and
standard deviations) of minimum-distance student
pairs, using the mean, median and max metrics.
For reference, the final column represents the av-
erage maximum distance each student moved from
the start state while completing the objective.

A visual inspection of these matrices reveals that while many
student pairs are quite divergent, some show a notable close-
ness throughout the exercise. In order to quantify these
results, we developed a set of distance metrics between stu-
dents. First, the distance matrix and the minimum-distance
“path” were calculated for the two students. The path is
comprised of pairs of states, and for each pair, we recorded
the tree edit distance between the states. From this list of
distances, we calculated the mean, median and maximum
distances between the two students. We looked at each ob-
Figure 2: Three distance matrices, each comparing jective in the exercise, and isolated the relevant subpath of
two students, where each pixel represents the TED each student who completed that objective. We paired each
between two states. Lighter shades of gray indi- of these subpaths with the most similar subpath in the set,
cate smaller distances. The green/yellow line shows using the mean, median and max distance metrics. Table 2
the path through the matrix with minimized total shows the average values of these minimized pairs of stu-
distance, with yellow shades indicating smaller dis- dents, using each metric. Objective 3, and objectives 7-9
tances. Red crosses indicate where both students were omitted, as too few students completed them.
met an objective. The top-left figure compares two
students as they completed objectives 1-4 in the ex-
ercise. In this example, the minimum-distance line 5. DISCUSSION
crosses each objective. The top right figure also de- We have attempted to apply a meaningful canonicalization
picts two students completing objective 4, but the to our state space, which did serve to reduce the number
darker colors and straighter line indicate less align- of states by 30.4%. However, as seen in Table 1, even af-
ment. The bottom figure depicts two students com- ter the strongest canonicalization, over 90% of the states in
pleting objectives 1-6, with high alignment. the interaction network had only been reached by one stu-
dent, with an average 78.9% of the states in each student’s
solution path being unique to that student. It seems that
path through the interaction network, but rather an align- our approach to canonicalization is insufficient to produce a
ment between the states of student A and those of student meaningful reduction of the state space, though it is possible
B. Each pixel of the line represents a pairing of a state a more stringent canonicalization would be more effective.
from A with a state from B, such that these pairings are
contiguous and represent the smallest total distance. While Connecting existing states seems to be a more promising ap-
we do not suggest applying this directly as a strategy for proach, giving us the ability to link new states to previously
hint generation, it serves an a useful visual indicator of the observed states, even when they do not match exactly. Our
compatibility of two students for hinting purposes. distance matrices indicate that some students take parallel,
or slowly diverging solution paths, which suggests that they
An alternate approach would have been to pair each state in may be useful to each other in the context of hinting. As
the interaction network with its closest pair from any other shown in Figure 2, students are often closest together when
student, and use this as a measure of how sparse the network completing the same objective. This may seem self-evident,
was. We chose to compare whole students, rather than indi- but it does indicate that our distance metric is meaning-
vidual states, because we felt that the former could lead to ful. It is more difficult to put the actual TED values into
strange hinting behavior. Imagine, for instance, that a stu- context. Students get, on average, farther away from their
dent requests a hint, which initially points to a state from closest paired student as they complete more objectives, but
student B, but at the very next step requests a hint that this average distance does not exceed 8 tree edits during the
points instead to student C. Perhaps the attributes that first 6 objectives. To put that number into context, during
make the student’s state similar to that of B are different this same time students do not, on average, get more than
from those that make the state similar to C. The resulting 19 tree edits away from the start state. This suggest that
hints would be at best confusing, and at worst conflicting. there is certainly hint-relevant knowledge in these paired stu-
dents, but that it may be difficult to harness this knowledge [3] V. Cateté, K. Wassell, and T. Barnes. Use and
to generate a hint. development of entertainment technologies in after
school STEM program. In Proceedings of the 45th
ACM technical symposium on Computer science
5.1 Limitations education, pages 163–168, 2014.
It is important to note that this analysis is an exploratory [4] M. Eagle, M. Johnson, and T. Barnes. Interaction
case study, and makes no strong claims, only observations. Networks: Generating High Level Hints Based on
We studied data from only 17 novice programmers, and the Network Community Clustering. In International
problem we analyzed was highly complex, involving multiple Educational Data Mining Society, pages 164–167,
control structures, loosely ordered objectives, and unstruc- 2012.
tured output. This makes the problem quite dissimilar from [5] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn
previous problems that have been been studied in the con- from you, you learn from me: How to make iList learn
text of hint generation, making it difficult to determine what from students. In Artificial Intelligence in Education
observations should be generalized. (AIED), 2009.
[6] D. Garcia, B. Harvey, L. Segars, and C. How. AP CS
5.2 Future Work Principles Pilot at University of California, Berkeley.
Rivers and Koedinger [16], as well as Jin et al. [9] note ACM Inroads, 3(2), 2012.
the limitations of their methods for larger problems, and [7] A. Hicks, B. Peddycord III, and T. Barnes. Building
each suggest that breaking a problem down into subprob- Games to Learn from Their Players: Generating Hints
lems would help to address this. Lazar and Bratko [10] at- in a Serious Game. In Intelligent Tutoring Systems
tempted this in their Prolog tutor by constructing hints for (ITS), pages 312–317, 2014.
individual lines of code, which were treated as independent [8] M. Homer and J. Noble. Combining Tiled and Textual
subproblems. Similarly, we may be able to isolate the sub- Views of Code. In Proceedings of 2nd IEEE Working
section of a student’s code that is currently relevant, and Conference on Software Visualization, 2014.
treat this as an independent problem. The hierarchical na- [9] W. Jin, T. Barnes, and J. Stamper. Program
ture of block-based coding environments lends itself to this representation for automatic hint generation for a
practice, making it an appealing direction for future work. data-driven novice programming tutor. In Intelligent
Tutoring Systems (ITS), 2012.
Our work makes the simplifying assumption that unweighted [10] T. Lazar and I. Bratko. Data-Driven Program
TED is a reliable distance metric for code, but future work Synthesis for Hint Generation in Programming Tutors.
should investigate alternative metrics. This might include a In Intelligent Tutoring Systems (ITS). Springer, 2014.
weighted TED metric, which assigns different costs to inser- [11] M. Pawlik and N. Augsten. RTED: a robust algorithm
tions, deletions and replacements, or even to different types for the tree edit distance. Proceedings of the VLDB
of nodes (e.g. deleting a for-loop node might cost more than Endowment, 5(4):334–345, 2011.
a function call node). Regardless of the metric used, once [12] B. Peddycord III, A. Hicks, and T. Barnes.
two proximate states are identified, it is still an open ques- Generating Hints for Programming Problems Using
tion how this information can be best used for hint gener- Intermediate Output. In Proceedings of the 7th
ation. It is possible to construct a path between the states International Conference on Educational Data Mining
and direct a student along this path. However, future work (EDM 2014), pages 92–98, 2014.
might also investigate how to extract hint-relevant informa- [13] C. Piech, M. Sahami, J. Huang, and L. Guibas.
tion from one state and apply it to a similar state directly. Autonomously Generating Hints by Inferring Problem
Solving Policies. In Learning at Scale (LAS), 2015.
Because of the nature of our problem’s output, we did not
[14] M. Resnick, J. Maloney, H. Andrés, N. Rusk,
explore non-code-based state representations, as described
E. Eastmond, K. Brennan, A. Millner, E. Rosenbaum,
in Section 2.3. It would still be worth investigating how this J. Silver, B. Silverman, and Y. Kafai. Scratch:
might be applied to open-ended problems. For instance, a programming for all. Communications of the ACM,
code state could be represented as a boolean vector, indi- 52(11):60–67, 2009.
cating whether the code has passed a series of Unit Tests,
[15] K. Rivers and K. Koedinger. A canonicalizing model
and hints could direct the student to the current flaw in
for building programming tutors. In Intelligent
their program. However, creating actionable hints from this
Tutoring Systems (ITS), 2012.
information would pose a significant challenge.
[16] K. Rivers and K. Koedinger. Automatic generation of
programming feedback: A data-driven approach. In
6. REFERENCES The First Workshop on AI-supported Education for
[1] T. Barnes and J. Stamper. Toward Automatic Hint Computer Science (AIEDCS 2013), 2013.
Generation for Logic Proof Tutoring Using Historical [17] K. Rivers and K. Koedinger. Automating Hint
Student Data. In Intelligent Tutoring Systems (ITS), Generation with Solution Space Path Construction. In
pages 373–382, 2008. Intelligent Tutoring Systems (ITS), pages 329–339,
[2] T. Barnes, J. Stamper, L. Lehman, and M. Croy. A 2014.
pilot study on logic proof tutoring using hints [18] J. Stamper, M. Eagle, T. Barnes, and M. Croy.
generated from historical student data. In Proceedings Experimental evaluation of automatic hint generation
of the 1st Annual International Conference on for a logic tutor. Artificial Intelligence in Education
Educational Data Mining (EDM), pages 1–5, 2008. (AIED), 22(1):3–17, 2013.
Studio: Ontology-Based Educational Self-Assessment
Christian Weber Réka Vas
Corvinno Technology Corvinus University
Transfer Center of Budapest
Budapest, Hungary Budapest, Hungary
cweber@corvinno.com reka.vas@uni-corvinus.hu

ABSTRACT cognitive behavior. In his discussion, eight out of nine knowledge
Students, through all stages of education, grasp new knowledge types underline that knowledge in the scope of learning is
in the context of knowledge memorized all through their previous interrelated and strongly associated with previous experiences
education. To self-predict personal proficiency in education, self- [2]. As such, a supporting solution for self-assessment should
assessment acts as an important learning feedback. The in-house grasp and formalize the knowledge to assess in the context of
developed Studio suit for educational self-assessment enables to related knowledge.
model the educational domain as an ontology-based knowledge The Studio suit for educational self-assessment, presented in this
structure, connecting assessment questions and learning material paper, provides here a software solution for testing the personal
to each element in the ontology. Self-assessment tests are then proficiency in the context of related knowledge. It enables to
created by utilizing a sub-ontology, which frames a tailored model areas of education as a substantial source for assessment
testing environment fitting to the targeted educational field. In and narrows the gap between a potentially flawed self-prediction
this paper we give an overview of how the educational data is and the real proficiency, by offering an objective and adaptive
modeled as a domain ontology and present the concepts of online knowledge-test. To follow the natural learning process
different relations used in the Studio system. We will deduct how and enable an easy extension, the software embeds the assessed
the presented self-assessment makes use of the knowledge knowledge into a network of contextual knowledge, which
structure for online testing and how it adapts the test to the enables to adapt the assessment to the responses of the students.
performance of the student. Further we highlight where
potentials are for the next stages of development. This paper will give an overview of the Studio educational
domain ontology and the aspects of the system supporting
Keywords personalized self-assessment. Further it will highlight potentials
for data mining on the gathered educational data with an outlook
Education, adaptive test, self-assessment, educational ontology on the next stages of evaluation.

1. INTRODUCTION 2. THE STUDIO APPROACH FOR SELF-
Students exploring new fields of education are always confronted ASSESSMENT
with questions regarding their individual progress: how much do The basic concept of Studio is to model the focused education as
they know after iterations of learning, in which directions should an interrelated knowledge structure, which divides the education
they progress to fill the field most effectively, how to grasp the into sub-areas and knowledge items to know. The managed
outline and details of the field and how much of their time structure formalizes the relation between knowledge areas as a
should they invest in learning? Especially in higher education, learning context and models the requirements to master specific
where learning becomes a self-moderated, personalized process, parts of the education. This structure is used to create and
students are in need of continuous self-assessment to capture support knowledge tests for students. Through this combination
their current state of proficiency. At the same time, the of assessment and knowledge structure, the student gains the
unframed, informal self-prediction of students regarding their freedom to explore not only single knowledge items but the
personal skills is often substantive and systematically flawed [1]. education in the context of related knowledge areas, while the
Here a systematic and objective solution for self-assessment is embedded requirements are used to map the modeled knowledge
substantial to prevent a wrong or biased self-evaluation and to against the expected educational outcome.
support the self-prediction of the personal proficiency.
The assessment-system is designed to be accompanied by phases
Following Jonassen, knowledge in education could be split into of learning within the system, where the student gets access to
nine types across three categories to capture the human’s learning material, based on and supported by the test feedback.
This combined approach offers a unique self-assessment to the
students, where the backing knowledge context is used to adapt
the assessment in dependency of the test performance of the
student.
Before any regular examination students may use Studio to
assess their knowledge on their own. It is the tutor’s
responsibility to set the course of self-assessment test in Studio
system by selecting knowledge areas and sub-knowledge areas aspects of education as the curriculum or aspects relevant for the
which are relevant for the target education from the domain task of learning and course creation [8][9][10] or describe the
ontology. Then the frame will be automatically completed with design, use and retrieval of learning materials till creating
elements from the ontology which detail the selected knowledge courses [11], as well as directly the learner within the education
areas and are modeled as required for this part of the education. [12].
As the system stores assessment questions for each knowledge Within the area of educational ontologies, domain ontologies
element, Studio will then automatically prepare an assessment tend to model too specific details of the education, in an attempt
test, based on the defined selection and the domain ontology. The to model the specific field as complete as possible. This enables
resulting knowledge-test is then accessible as a self-assessment a comprehensive view on the field but it comes at the cost of
test for the student, who explores the backed knowledge generality, with the potential to be inflexible to handle changes.
structure, which pictures the expected learning outcome, in Other concepts model the education across different ontologies,
cycles of testing, reflection and learning. The process of test matching concepts like the learner, the education and the course
definition and assessment is shown in Figure 1, while the result description, introducing a broad horizon but with additional
preparation for reflection and learning is discussed in section 2.5. overhead to combine modelled insights and reason on new
instances.
The appeal of the Studio educational ontology is the size and
focus of the main classes and their relationships between each
other. The knowledge to learn is the main connecting concept in
the core of education. It enables a great flexibility to be
resourceful for different education related questions. An example
is here the business process management extension PROKEX,
which maps process requirements against knowledge areas to
create assessment test, reflecting the requirements of attached
processes [13].
An important factor in learning is the distance between the
expectation of the tutor and the learning performance of the
student. Here a short cycle of repeated assessment and learning
is a major factor for a better personal learning performance [14].
This aspect directly benefits from the focused concentration on
knowledge-areas as the main exchange concept between students
and tutors. As even further the close connections between
learners and educators via direct tutoring is one major enabler for
computer aided systems [15], each step towards a more direct
interaction through focused concepts is an additional supporter.
The class structure fuses the idea of interrelated knowledge with
a model of the basic types of educational concepts, involved in
situations of individual learning. Figure 2 visualizes the class
concepts as knowledge elements, together with the relation types,
used to model the dependencies between different aspects of
knowledge and learning within the educational ontology.
The Knowledge Area is the super-class and core-concept of the
ontology. The ontology defines two qualities of main relations
between knowledge areas: Knowledge areas could be a sub-
knowledge area of other knowledge areas with the “has_sub-
Figure 1: The overall design, assess and reflection cyle of the knowledge_area” relation or be required for another knowledge
system. area with the “requires_knowledge_of” relation. A knowledge
area may have multiple connected knowledge areas, linked as a
2.1 The Educational Domain Ontology requirement or sub-area. The “requires_knowledge_of” relation
The Studio system is based on a predesigned educational
defines that a node is required to complete the knowledge of a
ontology, explained in detail by Vas in [3]. Domain ontology is a parent knowledge area. This strict concept models a requirement
frequently used term in the field of semantic technologies and
dependency between fields of knowledge in education and yields
underlines the storage and conceptualization of domain
the potential to assess perquisites of learning, analog to the basic
knowledge and is often used in a number of projects and
idea of perquisites within knowledge spaces, developed by
solutions [4][5][6] and could address a variety of domains with Falmagne [16].
different characteristics in their creation, structure and
granularity, depending on the aim and the modeling person [7]. A Education is a structured process which splits the knowledge to
specialization in terms of the field is the educational domain learn into different sub-aspects of learning. Knowledge areas in
ontology which is a domain ontology adapted to the area and the ontology are extended by an additional sub-layer of
concepts of education. They could target to model different knowledge elements in order to effectively support educational
and testing requirements. Figure 2 visualizes the sub-elements 2.3 Creating and Maintaining Tests
and their relations. By splitting the assessed knowledge into sub- The creation and continues maintenance of the domain ontology
concepts, the coherence and correlation of self-assessment is a task of ontology engineering. The ontology engineer (the
questions could be expressed more efficiently and with the ontologist), creates, uses and evaluates the ontology [17], with a
potential of a more detailed educational feedback. strong focus on maintaining the structure and content. Within
Studio, this process is guided and supported by a specialized
Has sub- Requires administration workflow and splits in three consecutive task
Knowledge Area areas, in line with decreasing access rights:
knowledge area knowledge of

Ontology engineering (instance level): The creation
and linking of instances of the existing knowledge-area
Part of Part of classes into the overall domain ontology.
Part of
Test definition: Knowledge areas, which are relevant
to a target self-assessment test, are selected and
Refers grouped into specialized containers called Concept
Basic Concept Example
to
Refers to Groups (CG). These concept groups are organized into
a tree of groups, in line with the target of the
assessment. The final tree in this regards captures a
Refers to
sub-ontology. Concept groups are internally organized
based on the overall ontology and include all relations
Pr
Co

em
nc

between knowledge elements, as defined within the
ise
lu
s

domain ontology.
io
n

Theorem Refers
to Question and learning material creation: Questions
and learning materials alike are directly connected to
single knowledge areas within the designed test frame
and get imported, if already existing, from the domain
Test Learning ontology. More questions and learning materials are
Questions Material defined now, in line with the additional need of the
Testbank
targeted education and are available for future tests.
The pre-developed structure of classes and relations is fixed as
Figure 2: Model of the educational ontology. the central and integral design of the system. A view of the
Theorems express in a condensed and structured way the system interface for administration is provided in Figure 3. The
fundamental insights within knowledge areas. They fuse and left area shows the visualization of the current ontology section
explain the basic concepts of the depicted knowledge and set in revision and the right area shows the question overview with
them in relation to the environment of learning with examples. editing options. Tabs give access to additional editing views,
Multiple theorems could be “part_of” a knowledge area. Each including the learning material management and interfaces to
theorem may define multiple Basic Concepts as a “premise” or modify relations between nodes and node descriptions.
“conclusion”, to structure how the parts of the knowledge area
are related. Examples enhance this parts as a strong anchor for 2.4 Adaptive Self-Assessment
self-assessment questions and “refer_to” the theorems and basic To prepare an online self-assessment test, the system has to load
concepts as a “part_of” one or more knowledge areas. the relevant educational areas from the domain ontology and
extract the questions and relations of the filtered knowledge
2.2 The Testbank areas.
In order to connect the task of self-assessment with the model of The internal test algorithm makes use of two assumptions:
the educational domain, the system integrates a repository of
assessment questions. Each question addresses one element of Knowledge-area ordering: As the main knowledge
the overall knowledge and is directly associated with one areas are connected through “requires_knowledge_of”
knowledge area or knowledge element instance within the and “part_of” relations, every path, starting with the
ontology. The domain ontology provides here the structure for the start-element, will develop on average from general
online self-assessment while the repository of questions concepts to detailed concepts - given that the concept
supplements the areas as a test bank. The target of the self- groups in the test definition are also selected and
assessment is to continuously improve the personal knowledge ordered to lead from general to more detailed groups.
within the assessed educational areas, by providing feedback on Knowledge evaluation dependency: If a person,
the performance after each phase of testing. To do so, the Studio taking the test, fails on general concepts he or she will
system includes Learning Material connected to the test bank and potentially also fail on more detailed concepts. Further,
the knowledge areas, analog to the test questions. The learning if a high number of detailed concepts are failed, the
material is organized into sections as a structured text with parent knowledge isn’t sufficiently covered and will be
mixed media, as pictures and videos, and is based on a wiki- derived as failed, too.
engine to maintain the content, including external links.
Figure 3: The main ontology maintenance and administration interface, showing a part of the domain ontology.
The filtering is done based on the selection of a tutor, acting as same level. If the learner’s answer is correct, the system will
an expert for the target educational area. The tutor chooses activate the child elements of the current node and draw a
related areas, which are then created as a Test Definition, random question from the first left child.
containing Concept Groups, as described in section 2.3. The Based on the tree shaped knowledge structure, the assessment
system then uses the test definition as a filtering list to extract now follows these steps to run the self-assessment, supported by
knowledge areas. After the extraction, the structure is cached as the extracted knowledge structure:
a directed graph, while the top element of the initial concept
group is set as a start element. Beginning with the start-element, 1. Starting from the start-element, the test algorithm will
the test will move then through the graph, while administering activate the child knowledge-areas of the start element.
the questions connected to knowledge areas and knowledge 2. The algorithm now selects the first child-knowledge
elements. area and draws a random question out of the pool of
The loading of knowledge-elements follows three steps: available questions for this specific knowledge-element
from the test bank.
1. Each type of relation between two knowledge-elements
implements a direction for the connection. Assuming 3. If the learner fails the question, the algorithm will
the system loads all relations, starting with the start- mark the element as failed and select the next
element and ending on a knowledge-element, this knowledge area from the same level. If the learner’s
creates a two level structure where the start-node is a answer is correct, the system will activate the child
parent-element and all related, loaded elements are elements of the current node and trigger the process for
child-elements, as seen below in Figure 4. each child-element.
2. The loading algorithm then selects one child-element
and assumes it as a start-element and repeat the
loading process of knowledge-elements.
3. When no knowledge-elements for a parent-element
could be loaded, the sub-process stops. When all sub-
processes have stopped, the knowledge structure is
fully covered.
The test algorithm will now activate the child knowledge areas of
the start element and select the first knowledge area to the left
and draw a random question from the selected knowledge area. If Figure 4: Excerpt from the sub-ontology visualization, with
the learner fails the question, the algorithm will mark the the visible parent-child relationship, as used in the data-
element as failed and selects the next knowledge area from the loading for preparing the self-assessment.
An example question is shown below in Figure 5. Further JavaScript framework [18]. The visualization itself is a custom
following the testing algorithm, the system dives down within the build, similar to the Ext JS graph function “Radar” and based on
domain ontology and triggers questions depending on the the idea of Ka-Ping, Fisher, Dhamija and Hearst [19]. All views
learner’s answers and the extracted model of the relevant are able to zoom in and out of the graph, move the current
education. In this regards the Studio system adapts the test on the excerpt and offer a color code legend, explaining the meaning of
fly to the performance of the learner. Correlating to the idea of the colored nodes. In comparison with state of the art, the
adaptation, the learner will later gain access to learning material interface offers no special grouping or additional visualization
for each mastered knowledge area. As the learner continues to features like coding information into the size of nodes. Each
use the self assessment to evaluate the personal knowledge, he or interface offers an additional textual tree view to explore the
she will thus explore different areas of the target education, knowledge-elements or concept groups in a hierarchical listing.
following their individual pace of learning. This simple, straightforward approach for visualization correlates
with the goal of a direct and easy to grasp feedback through
interfaces which have a flat learning curve and enable to catch
the functionality in a small amount of time.
While this simple visualization is sufficient for the reasonable
amount of knowledge-elments within the result view, this alone
is not suitable for the domain ontology administration interface,
Figure 5: Test interface with a random drawn test question. as seen in Figure 3. Here Studio realizes methodologies to filter
and transform the data to visualize. To do so it makes use of two
2.5 Test Feedback and Result Visualization supporting mechanisms:
An important aspect of the system is the test feedback and The maximum-level-selector defines the maximum
evaluation interface. The educational feedback is one of the main level the system extracts from the domain ontology for
enabler for the student to grasp the current state and extend of full screen visualization.
the personal education. The domain ontology models the
structure and the dependencies of the educational domain, and In combination with the maximum level, the ontologist
the grouped test definition extracts the relevant knowledge for could select single elements within the domain
the target area or education. As such, the visualization of the ontology. This triggers an on-demand re-extraction of
ontology structure extracted for the test, together with the the visualized data, setting the selected knowledge-
indication of correct and incorrect answers, represents a map of element as the centre element. The system then loads
the knowledge of the learner. the connected nodes, based on their relations into the
orientation circles till the maximum defined level is
Throughout each view onto the ontology, the system uses the reached. More details about the transformation are in
same basic visualization, making use of the Sencha Ext JS [19].

Figure 6: Result visualization as educational feedback for the learner.
Together, this selection and transformation mechanism enables Each event stores information about the system in 7
the fluent navigation within the complete domain ontology dimensions, as described in Table 1 below:
structure, while re-using the same visualization interface. Table 1: Event blueprint to store events concerning system
Figure 6 shows the main view of the result interface. The left interaction.
area shows the sub-ontology extracted for the test, while the
Attribute Description
colored nodes represent the answers to the administered
questions. A red node visualizes wrong answers, while orange Event description code Which type of event and what
nodes are rejected nodes with correct answers but with an factors are relevant.
insufficient number of correctly answered child nodes, Location code On which part of the assessment-
indicating a lack of the underlying knowledge. Green nodes process or interface the event has
represent accepted nodes with correct answers and a sufficient occurred.
amount of correctly answered questions for child nodes. Grey
nodes are not administered nodes, which were not yet reached Session identifier Each access of the system is one
by the learner, as higher order nodes had no adequate session for one user.
acceptance. Numerical value storage Multi-purpose field, filled
Even though the target of the system is not a strict evaluation in depending on the event type.
number, the evaluation of the percentage of solved and String value storage Multi-purpose field, filled
accepted knowledge elements helps the learner to track the depending on the event type.
personal progress and could additionally be saved as a report
for further consultation. Besides providing an overview of the Event-time The time of the start of the
event.
self-assessment result, the result interface gives access to the
integrated learning material. For every passed node, the learner Item reference A unique reference code,
can now open the correlated material and intensify the identifying the correlated item
knowledge for successful tested areas. within the ontology. E.g. a
Retaking the test in cycles of testing and learning, while question or a knowledge-element
adapting the educational interaction, is the central concept of ID.
the Studio approach for self-assessment. As a consequence the All events are stored in order of their occurrence, so if no
system will not disclose the right answers to questions or explicit end event is defined, the next event for the same
learning material for not yet administered knowledge areas, to session and user is acting as the implicit end date. Extending
promote an individual reflection on the educational content the existing storage of information within Studio, the new
outside of a flat memorization of content. logging system stores the additional events, as shown in Table
2 below:
3. SYSTEM EVALUATION
Table 2: Assessment events and descriptions.
The system has been used, extended and evaluated in a number
of European and nationally funded research projects, including Event type Description
applications in business process management and innovation- START_TEST Marks the start of a test.
transfer [20], medical education [21] and job market
competency matching [22]. END_TEST Marks the end of a test.
Currently the system is being evaluated based on a running OPEN_WELCOME_LM The user opened the welcome
study with 200 university students in the field of business page.
informatics. The study will conclude on two current research OPEN_LM_BLOCK The student opened a learning
streams which are improving the systems testing and analysis material block on the test
capability. The first direction looks into potentials for the interface.
integration of learning styles into adaptive learning systems to
offer valuable advice and instructions to teachers and students OPEN_LM The student opened the learning
[23]. Within the second direction the question is challenged on material tab on the test interface.
how to adapt the presented self-assessment further towards the RATE_LM The student rated the learning
performance of the students, based on extracting assessment material.
paths from the knowledge structure [24].
CHECK_RESULT The student opened a result page.
For each running test, Studio collects basic quantitative data
about the number of assigned questions, how often tests are CONTINUE_TEST The student submitted an answer.
taken and how many students open which test and when. This FINISH_TEST The test has been finished.
is completed by qualitative measures, collecting which
SUSPEND_TEST The user suspended the test.
questions and knowledge elements the students passed or
failed. To conclude further on the mechanisms and impacts of RESUME_TEST The user has restarted a previously
Studio within the current study, a new logging system was suspended test.
developed, collecting the interaction with the system and SELECT_TEST_ALGO- The algorithm used to actually test
detailed information about the feedback as detailed events. RITHM the student is selected.
TEST_ALGORITHM_- The behavior of the current test text,” Expert Systems with Applications, vol. 34, no. 2, pp.
EVENT algorithm changes, e.g. entering 1474–1480, Feb. 2008.
another stage of testing. [5] S.-H. Wu and W.-L. Hsu, “SOAT: a semi-automatic
domain ontology acquisition tool from Chinese corpus,” in
ASK_TESTQUESTION Sends out a test question to the
user to answer. Proceedings of the 19th international conference on
Computational linguistics-Volume 2, 2002, pp. 1–5.
STUDIO_LOGOUT The user logs out of the Studio [6] M. Missikoff, R. Navigli, and P. Velardi, “The Usable
system. Ontology: An Environment for Building and Assessing a
To store the events, the system implements an additional Domain Ontology,” in The Semantic Web — ISWC 2002, I.
logging database, splitting the concepts of the logging to a star- Horrocks and J. Hendler, Eds. Springer Berlin Heidelberg,
schema for efficient extraction, transformation and loading. The 2002, pp. 39–53.
logging system is modular and easy to extend with new [7] T. A. Gavrilova and I. A. Leshcheva, “Ontology design
concepts and easy to attach to potential event positions within and individual cognitive peculiarities: A pilot study,”
the Studio runtime. Together with the existing logging of the Expert Systems with Applications, vol. 42, no. 8, pp.
assessment evaluation feedback, this new extension tracks the 3883–3892, May 2015.
exploration of the sub-ontology within the assessment and [8] S. Sosnovsky and T. Gavrilova, “Development of
enriches the feedback data with context information of the Educational Ontology for C-programming,” 2006.
students behavior on the system. [9] V. Psyché, J. Bourdeau, R. Nkambou, and R. Mizoguchi,
“Making learning design standards work with an ontology
4. NEXT STEPS of educational theories,” in 12th Artificial Intelligence in
The domain ontology offers a functional and semantically rich Education (AIED2005), 2005, pp. 539–546.
core for supporting learning and education. Yet not all the [10] T. Nodenot, C. Marquesuzaá, P. Laforcade, and C.
semantic potentials are fully leveraged to support and test the Sallaberry, “Model Based Engineering of Learning
learner’s progress. The “requires_knowledge_of” relation- Situations for Adaptive Web Based Educational Systems,”
requirement is a potential start-concept to model sub-areas as in Proceedings of the 13th International World Wide Web
groups which together compose the dependency. This could act Conference on Alternate Track Papers &Amp; Posters,
as an additional input for the assessment, where the system New York, NY, USA, 2004, pp. 94–103.
derives more complex decision how to further explore the [11] A. Bouzeghoub, C. Carpentier, B. Defude, and F.
related parts of the structure [25]. This could also be visualized, Duitama, “A model of reusable educational components
enabling the learner to grasp the personal knowledge as a for the generation of adaptive courses,” in Proc. First
visible group of concepts. International Workshop on Semantic Web for Web-Based
Besides giving colors to the different types of relations, the Learning in conjunction with CAISE, 2003, vol. 3.
visualizing of edges between knowledge areas is yet unfiltered, [12] W. Chen and R. Mizoguchi, “Learner model ontology and
offering no further support for navigation. A next stage of learner model agent,” Cognitive Support for Learning-
implementation could be the introduction of a visual ordering Imagining the Unknown, pp. 189–200, 2004.
and grouping of knowledge areas and relations. Underlying [13] G. Neusch and A. Gábor, “PROKEX – INTEGRATED
relations of sub-nodes could be interpreted visually through the PLATFORM FOR PROCESS-BASED KNOWLEDGE
thickness of relations between nodes, easing the perception of EXTRACTION,” ICERI2014 Proceedings, pp. 3972–
complex parts of the domain ontology, especially within 3977, 2014.
administration and maintenance tasks. [14] H. L. Roediger III, “Relativity of Remembering: Why the
Laws of Memory Vanished,” Annual Review of
The feedback of the current evaluation study of Studio will
Psychology, vol. 59, no. 1, pp. 225–254, 2008.
provide additional insights into the usage of the system by the
[15] J. D. Fletcher, “Evidence for learning from technology-
students. Based on this new data it is possible to mine profiles
assisted instruction,” Technology applications in
over time on the knowledge structure. One major application is
education: A learning view, pp. 79–99, 2003.
here the creation of behavior profiles, as proposed in [23].
[16] J.-C. Falmagne, M. Koppen, M. Villano, J.-P. Doignon,
and L. Johannesen, “Introduction to knowledge spaces:
5. REFERENCES How to build, test, and search them,” Psychological
[1] D. Dunning, C. Heath, and J. M. Suls, “Flawed self- Review, vol. 97(2), pp. 201–224, 1990.
assessment implications for health, education, and the [17] F. Neuhaus, E. Florescu, A. Galton, M. Gruninger, N.
workplace,” Psychological science in the public interest, Guarino, L. Obrst, A. Sanchez, A. Vizedom, P. Yim, and
vol. 5, no. 3, pp. 69–106, 2004. B. Smith, “Creating the ontologists of the future,” Applied
[2] D. Jonassen, “Reconciling a Human Cognitive Ontology, vol. 6, no. 1, pp. 91–98, 2011.
Architecture,” in Constructivist Instruction: Success Or [18] “Ext JS API.” [Online]. Available:
Failure?, S. Tobias and T. M. Duffy, Eds. Routledge, http://www.objis.com/formationextjs/lib/extjs-
2009, pp. 13–33. 4.0.0/docs/index.html. [Accessed: 04-May-2015].
[3] R. Vas, “Educational Ontology and Knowledge Testing,” [19] K.-P. Yee, D. Fisher, R. Dhamija, and M. Hearst,
Electronic Journal of Knowledge Management, vol. 5, no. “Animated exploration of dynamic graphs with radial
1, pp. 123 – 130, 2007. layout,” in IEEE Symposium on Information
[4] M. Y. Dahab, H. A. Hassan, and A. Rafea, “TextOntoEx: Visualization, 2001. INFOVIS 2001, 2001, pp. 43–50.
Automatic ontology construction from natural English
[20] M. Arru, “Application of Process Ontology to improve the [23] H. M. Truong, “Integrating learning styles and adaptive e-
funding allocation process at the European Institute of learning system: Current developments, problems and
Innovation and Technology.,” in 3rd International opportunities,” Computers in Human Behavior, 2015.
Conference on Electronic Government and the [24] C. Weber, “Enabling a Context Aware Knowledge-Intense
Information Systems Perspective (EGOVIS)., Munich, Computerized Adaptive Test through Complex Event
Germany, 2014. Processing,” Journal of the Sientific and Educational on
[21] M., M., Ansari, F., Dornhöfer Khobreh and M. Fathi, “An Forum on Business Information Systems, vol. 9, no. 9, pp.
ontology-based Recommender System to Support Nursing 66–74, 2014.
Education and Training,” in LWA 2013, 2013. [25] C. Weber and R. Vas, “Extending Computerized Adaptive
[22] V. Castello, L. Mahajan, E. Flores, M. Gabor, G. Neusch, Testing to Multiple Objectives: Envisioned on a Case
I. B. Szabó, J. G. Caballero, L. Vettraino, J. M. Luna, C. from the Health Care,” in Electronic Government and the
Blackburn, and F. J. Ramos, “THE SKILL MATCH Information Systems Perspective, vol. 8650, A. Kő and E.
CHALLENGE. EVIDENCES FROM THE SMART Francesconi, Eds. Springer International Publishing, 2014,
PROJECT,” ICERI2014 Proceedings, pp. 1182–1189, pp. 148–162.
2014.
Graph Analysis of Student Model Networks

Julio Guerra Yun Huang
School of Information Sciences Intelligent Systems Program
University of Pittsburgh University of Pittsburgh
Pittsburgh, PA, USA Pittsburgh, PA, USA
jdg60@pitt.edu yuh43@pitt.edu

Roya Hosseini Peter Brusilovsky
Intelligent Systems Program School of Information Sciences
University of Pittsburgh University of Pittsburgh
Pittsburgh, PA, USA Pittsburgh, PA, USA
roh38@pitt.edu peterb@pitt.edu

ABSTRACT same for all students and at all times. In this work we ex-
This paper explores the feasibility of a graph-based approach plore the idea that it might be beneficial for a student model
to model student knowledge in the domain of programming. to include connections between domain KEs that represent
The key idea of this approach is that programming concepts some aspects of individual student knowledge rather than
are truly learned not in isolation, but rather in combina- domain knowledge. This idea is motivated by the recogni-
tion with other concepts. Following this idea, we represent tion that the mastery in many domains is reached as the
a student model as a graph where links are gradually added student practices connecting different KEs, i.e., each KE is
when the student’s ability to work with connected pairs of practiced in conjunction with other KEs. To address this, we
concepts in the same context is confirmed. We also hypothe- build a model represented as a network of KEs that get pro-
size that with this graph-based approach a number of tradi- gressively connected as the student successfully works with
tional graph metrics could be used to better measure student problems and assessment items containing multiple KEs. As
knowledge than using more traditional scalar models of stu- the student succeeds in more diverse items mapped to dif-
dent knowledge. To collect some early evidence in favor of ferent KEs, her model gets better connected.
this idea, we used data from several classroom studies to
correlate graph metrics with various performance and moti- To explore the value of this graph-based representation of
vation metrics. student knowledge, we compute different graph metrics (e.g.,
density, diameter) for each student and analyze them in re-
1. INTRODUCTION lation to student performance metrics and attitudinal ori-
Student modeling is widely used in adaptive educational sys- entations drawn from a motivational theory. This analysis
tems and tutoring systems to keep track of student knowl- was performed using data collected from 3 cohorts of a Java
edge, detect misconceptions, provide targeted support and programming course using the same system and the same
give feedback to the student [2]. The most typical overlay content materials. In the remaining part of the paper, we
student model dynamically represents the inferred knowl- describe related work, introduce and illustrate the suggested
edge level of the student for each knowledge element (KE) approach, describe graph and performance metrics, and re-
(also called knowledge component or KC) defined in a do- port the results of the correlation analysis.
main model. These knowledge levels are computed as the
student answers questions or solves problems that are mapped
to the domain KEs. Student models are frequently built over
2. RELATED WORK
networked domain models where KEs are connected by pre- Graph representation of student activity is not new. The
requisite, is-a, and other ontological relationships that are 2014 version of the Graph-Based Educational Data Mining
used to propagate the knowledge levels and produce a more Workshop 1 contains two broad types of related work: the
accurate representation of the knowledge of the learner. Since analysis of the networking interaction among students, for
these connections belong to domain models, they stay the example work on social capital [14] and social networking in
MOOCs [3, 12]; and analyses of learning paths over graph
representation of student traces while performing activities
in the system [1, 5]. Our work fits in the second type since
we model traces of each student interacting with the sys-
tem. However, our approach is different as it attempts to
combine an underlying conceptual model with the traces of
the student learning.

1
http://ceur-ws.org/Vol-1183/gedm2014_proceedings.
pdf
A considerable amount of work focused on graph repre-
sentation of domain models that serve as a basis for over-
lay student models. The majority of this work focused on
constructing the prerequisite relationships between domain
knowledge components (concept, skills) [6, 13]. In this case
links established between a pair of concepts represent prereq-
uisite - outcome relationship. Another considerable stream
of work explored the use of formal ontologies with such re-
lationships as is-a and part-of for connecting domain knowl-
edge components [7]. Ontological representation, in turn,
relates to another stream of work that applies graph tech-
niques to structural knowledge representation, for example
by analyzing the network properties of ontologies [9]. Figure 1: Exercise jwhile1

The research on graph-based domain models also leads to
a stream of work on using Bayesian networks to model the
relationships between domain concepts for knowledge prop- path length, and average node weight can be good indicators
agation in the process of student modeling [15, 4]. Yet, in of student knowledge compared to the amount of activities
both cases mentioned above links between knowledge com- done or overall measures of assessment like success rate on
ponents were not parts of individual student model, but ei- exercises. We further explore these graph metrics in relation
ther parts of the domain model or student modeling pro- with motivational factors drawn from a learning motivation
cess and thus remain the same for all students. In con- theory.
trast, the approach suggested in this paper adds links be-
tween knowledge components to individual student models 3.1 Domain Model and Content
to express combinations of knowledge components that the
Our content corpus is composed by a set of 112 interactive
given student explored in a problem solving or assessment
parameterized exercises (i.e., questions or problems) in the
process. This approach is motivated by our belief that in
domain of Java programming from our system QuizJet [11].
the programming domain, student knowledge is more effec-
Parameterized exercises are generated from a template by
tively modeled by capturing student progress when students
substituting a parameter variable with a randomly generated
needed to apply multiple concepts at the same time.
value. As a result each exercise can be attempted multiple
times. To answer the exercise the student has to mentally
3. THE APPROACH execute a fragment of Java code to determine the value of a
The idea behind our approach is that knowledge is likely to specific variable or the content printed on a console. When
be stronger for concepts which are practiced together with the student answers, the system evaluates the correctness,
a larger variety of other concepts. We hypothesize, for ex- reports to the student whether the answer was correct or
ample, that a student who solves exercises, in which the wrong, shows the correct response, and invites the student
concept for-loop is used with post-incremental operator and to “try again”. As a result, students may still try the same
post-decremental operator will have a better understanding exercises even after several correct attempts. An example of
of for-loop than another student who practices (even the parameterized java exercise can be seen in Figure 1.
same amount of times) the for loops concept in a more nar-
row context, i.e., only with post-incremental operator. To In order to find the concepts inside all of the exercises, we
represent our approach, for each student we build a graph- used a parser [10] that extracts concepts from the exercise’s
based student model as a network of concepts where the template code, analyzes its abstract syntax tree (AST), and
edges are created as the student succeeds in exercises con- maps the nodes of the AST (concepts extracted) to the nodes
taining both concepts to be connected. The Domain Model in a Java ontology 2 . This ontology is a hierarchy of pro-
defining the concept space and the mapping between the gramming concepts in the java domain and the parser uses
concepts and programming exercises is explained in the next only the concepts in the leaf nodes of the hierarchy.
section. The weight of the edges in the graph is computed
as the overall success rate on exercises performed by the In total there are 138 concepts extracted and mapped to
student which contain the pair of concepts. Pairs of con- QuizJet exercises. Examples of concepts are: Int Data Type,
cepts that do not co-occur in exercises succeeded by the stu- Less Expression, Return Statement, For Statement, Subtract
dent are not connected in her graph. In this representation, Expression, Constant, Constant Initialization Statement, If
highly connected nodes are concepts successfully practiced Statement, Array Data Type, Constructor Definition, etc.
with different other concepts. We also compute a measure We excluded 8 concepts which appear in all exercise tem-
of weight for each node by taking the average weight among plates (for example “Class Definition” or “Public Class Spec-
edges connecting the node. This measure of the success rate ifier” appear in the first line of all exercises). Each concept
on concepts favors exercises that connect more concepts be- appears in one or more Java exercises. Each of the 112 ex-
cause each exercise containing n concepts produce or affects ercises maps to 2 to 47 Java concepts. For example, the
n(n − 1)/2 edges. For example, a success on an exercise hav- exercise “jwhile1”, shown in Figure 1, is mapped to 5 con-
ing 10 concepts contributes to 45 edges, but a successful at- cepts: Int Data Type, Simple Assignment Expression, Less
tempt to an exercise connecting 5 concepts only contributes Expression, While Statement, Post Increment Expression.
to 10 edges. We hypothesize that in a graph built following
2
this approach, metrics like average degree, density, average http://www.sis.pitt.edu/~paws/ont/java.owl
3.2 Graph Metrics Table 1: Correlation between activity measures and grade
To characterize the student knowledge graph we computed and between graph metrics and grade. * significance at 0.1,
standard graph metrics listed below. ** significance at 0.05.
Measure of Activity Corr. Coeff. Sig. (p)
• Graph Density (density): the ratio of the number of Correct Attempts to Exercises .046 .553
edges and the number of possible edges. Distinct Correct Exercises .114 .147
• Graph Diameter (diameter): length of the longest Overall Success Rate .298 .000**
shortest path between every pair of nodes. Avg. Success Rate on Concepts .188 .016**
• Average Path Length (avg.path.len): average among
the shortest paths between all pairs of nodes. Graph Metric Corr. Coeff. Sig. (p)
• Average Degree (avg.degree): average among the Average Degree .150 .055*
degree of all nodes in an undirected graph. Graph Density .102 .190
• Average Node Weight (avg.node.weight): the weight Graph Diameter .147 .081
of a node is the average of the weight of its edges. We Average Path Length .152 .052*
then average the weights of all nodes in the graph. Average Node Weight .201 .010**

3.3 Measures of Activity
To measure student activity so that it could be correlated 4. EXPERIMENTS AND RESULTS
with the graph metrics we collected and calculated the fol- 4.1 Dataset
lowing success measures: We collected student data over three terms of a Java Pro-
gramming course using the system: Fall 2013, Spring 2014,
• Correct Attempts to Exercises (correct.attempts): and Fall 2014. Since the system usage was not mandatory,
total number of correct attempts to exercises. It in- we want to exclude students who just tried the system while
cludes repetition of exercises as well. likely using other activities (not captured by the system)
• Distinct Correct Exercises (dist.correct.attempts): for practicing Java programming. For this we looked at the
number of distinct exercise attempted successfully. distribution of distinct exercises attempted and we exclude
• Overall Success Rate (success.rate): the number all student below the 1st quartile (14.5 distinct exercises
of correct attempts to exercises divided by the total attempted). This left 83 students for our analysis. In to-
number of attempts. tal these students made 8,915 attempts to exercises. On
• Average Success Rate on Concepts average, students have attempted about 55 (Standard Devi-
(avg.concept.succ.rate): we compute the success rate ation=22) distinct exercises while performing an average of
of each concept as the average success rate of the ex- 107 (SD=92) exercises attempts. On average, students have
ercises containing the concept. Then we average this covered about 63 concepts with SD=25 (i.e., succeeded in at
among all concepts in the domain model. least one exercise containing the concept), and have covered
about 773 concept pairs with SD=772 (i.e., succeeded in at
least one exercise covering the concept pair.) The average
3.4 Motivational Factors success rate ( #correct attempts
) across students is about 69%
We use the revised Achievement-Goal Orientation question- #total attempts

naire [8] which contains 12 questions in a 7-point Likert (SD=11%).
scale. There are 3 questions for each of the 4 factors of
the Achievement-Goal Orientation framework : Mastery - 4.2 Graph Metrics and Learning
Approach, Mastery-Avoidance, Performance-Approach and We compare graph metrics (Avg. Degree, Graph Density,
Performance-Avoidance. Mastery-Approach goal orien- Graph Diameter, Avg. Path Length and Avg Node Weight)
tation relates to intrinsic motivation: “I want to learn this and measures of activity (Correct Attempts to Exercises,
because it is interesting for me”, “I want to master this Distinct Correct Exercises, Overall Success Rate and Avg.
subject”; Mastery-Avoidance relates to the attitude of Success Rate on Concepts) by computing the Kendall’s τB
avoid to fail or avoid learning less than the minimum; Per- correlation of these metrics with respect to the students’
formance -Approach goal orientation stresses the idea of grade on the programming course. Results are displayed in
having a good performance and relates well with social com- Table 1.
parison: “I want to perform good in this subject”, “I want to
be better than others here”; and Performance-Avoidance Surprisingly, the plain Overall Success Rate (which does
oriented students avoid to get lower grades or avoid to per- not consider concepts disaggregation, nor graph informa-
form worse than other students. The goal orientation of tion) is better correlated with course grade than any other
a student helps to explain the behavior that the student measure. Students who succeed more frequently, get in gen-
exposes when facing difficulty, but does not label the final eral better grades. Interestingly, both the Average Suc-
achievement of the student. For example, if a student is cess Rate on Concepts and the Average Node Weight
Mastery-Approach oriented, it does not necessarily mean are both significantly correlated with grade. This last mea-
that the student reached the mastery level of the skill or sure uses the graph information and presents a slightly bet-
knowledge. In our case, we believe the achievement-goal ori- ter correlation than the former, which does not consider the
entation of the student can convey the tendency to pursue graph information.
(or avoid) to solve more diverse (and more difficult) exer-
cises, which contain more heterogeneous space of concepts, Among the other graph metrics, Average Degree and Av-
thus contribute to form better connected graphs. erage Path Length are marginally correlated with course
which extent the motivational profile of the student explains
the graph’s shape. Step-wise regression models were used
where the dependent variables are the graph metrics and
the independent variables are the motivational factors. We
found a significant model of the diameter of the graph (R2 =
0.161, F = 6.523, p = 0.006) with the factors Mastery-
Avoidance (B = 0.952, p = 0.001) and Mastery-Approach
(B = −0.938, p = 0.006). Note the negative coefficient for
Mastery-Approach and the positive coefficient for Mastery-
Avoidance. As the Achievement-Goal Orientation frame-
Figure 2: Graph representation of two students. work suggests, Mastery-Approach oriented students are mo-
tivated to learn more, tend to explore more content and do
not give up easily when facing difficulties; Mastery-Avoidance
Table 2: Graph metrics, measures of activity and motiva-
students, in the other hand, do not cope well with diffi-
tional scores of 2 students.
culties and tend to give up. Then, a possible explanation
Student A Student B of the results is that, in one hand, students with higher
Graph Density 0.077 0.094 Mastery-Approach orientation are more likely to solve diffi-
Graph Diameter 2.85 2.00 cult questions which connects more and more distant con-
Avg. Path Length 1.77 1.78 cepts which decreases the graph diameter; and on the other
Avg. Degree 8.64 10.55 hand, Mastery-Avoidance students avoid difficult exercises
Avg. Node Weight 0.49 0.51 containing many concepts, thus making less connections and
Correct Attempts 71 83 producing graphs with higher diameters. Correlations be-
Dist. Correct Exercises 66 61 tween graph metrics and motivational factors confirmed the
Overall Succ. Rate 0.82 0.76 relation between Mastery-Avoidance and Graph Diameter
Avg. Succ.Rate on Concepts 0.50 0.53 (Kendall’s τB = 0.197, p = 0.030). Although these re-
Mastery-Approach 0.83 0.78 sults are encouraging, they are not conclusive. For exam-
Mastery-Avoidance 0.83 0.56 ple, Mastery-Approach students might just do more work,
Performance-Approach 1.0 0.17 not necessarily targeting difficult questions. More analysis
Performance-Avoidance 1.0 0.0 is needed to deeply explore these issues.
Grade (%) 100 97
5. DISCUSSIONS AND CONCLUSIONS
In this paper we proposed a novel approach to represent stu-
grade (p values less than 0.1). Although this is a weak ev- dent model in the form of a dynamic graph of concepts that
idence, we believe that we are in the good track. A higher become connected when the student succeed in assessment
Average Degree means a better connected graph, thus it item containing a pair of concepts to be connected. The
follows our idea that highly connected nodes signal more idea behind this approach is to strengthen the model for
knowledge. Average Path Length is more difficult to in- those concepts that are applied in more different contexts,
terpret. A higher Average Path Length means a less i.e., in assessment items containing other different concepts.
connected graph (which contradicts our assumption), but We applied this approach to data of assessment items an-
also, it can express students reaching more “rear” concepts swered by real students and analyzed the graph properties
which appear in few more-difficult-exercises and generally comparing them to several performance measures such as
have longer shortest paths. We think that further explo- course grade as well as motivational factors. Results showed
ration of metrics among sub-graphs (e.g. a graph for an that this idea is potentially a good indicator of knowledge
specific topic), and further refinement of the approach to of the students, but further refinement of the approach is
build edges (e.g. connecting concepts that co-occur close to needed. We used several measures of the built graphs as
each other in the exercise) could help to clarify these results descriptors of student knowledge level, and we found that a
metric aggregating the success rates of the edges to the level
Figure 2 shows the graphs of 2 students who have similar of concepts (nodes) is highly correlated to course grade, al-
amount of distinct exercises solved correctly but present dif- though it does not beat the plain overall success rate of the
ferent graph metrics and motivational profile. See metrics in student in assessment items.
Table 2. Student B has more edges, lower diameter, higher
density, higher degree, solved less questions more times. Stu- In the future work, we plan to repeat our analysis using
dent A presents a less connected graph although she he/she more reliable approaches to construct the knowledge graph.
solved more distinct questions (66 compared to 61 on Stu- One idea is to use rich information provided by the parser
dent B). Student B has lower Mastery-Avoidance orientation (mapping between exercises and concepts) to ensure that
score and lower Performance orientation scores than Student each new link connects concepts that interact considerably
A, which could explain why Student B work result in a bet- in the program code. This could be done by controlling
ter connected graph. Analyses of Motivational factors are the concepts proximity in the question code (e.g. only con-
described in the following section. sider co-occurrence when concepts are close to each other
in the parser tree.) Another approach to keep more reliable
4.3 Metrics and Motivation edges is to consider only a subset of important concepts
We now explore the relationship between motivational fac- for each problem using feature selection techniques. Also
tors and the graphs of the students. The idea is to see to we plan to perform analyses of sub-graphs targeting specific
“zones” of knowledge. For example, a partial graph with [7] P. Dolog, N. Henze, W. Nejdl, and M. Sintek. The
only concepts that belongs to a specific topic, or concepts personal reader: Personalizing and enriching learning
that are prerequisites of a specific concept. Another inter- resources using semantic web technologies. In
esting idea relates to recommendation of content: guide the P. De Bra and W. Nejdl, editors, Adaptive Hypermedia
and Adaptive Web-Based Systems, volume 3137 of
student to questions that will connect the isolated parts of Lecture Notes in Computer Science, pages 85–94.
the knowledge graph or minimize the average path length of Springer Berlin Heidelberg, 2004.
the graph. Along the same lines, the analysis of the graph [8] A. J. Elliot and K. Murayama. On the measurement
shortest paths and overall connectivity can help in designing of achievement goals: Critique, illustration, and
assessment items that better connect distant concepts. application. Journal of Educational Psychology,
100(3):613, 2008.
[9] B. Hoser, A. Hotho, R. Jäschke, C. Schmitz, and
6. REFERENCES G. Stumme. Semantic network analysis of ontologies.
[1] N. Belacel, G. Durand, and F. Laplante. A binary Springer, 2006.
integer programming model for global optimization of [10] R. Hosseini and P. Brusilovsky. Javaparser: A
learning path discovery. In Workshop on Graph-Based fine-grain concept indexing tool for java exercises. In
Educational Data Mining. The First Workshop on AI-supported Education for
[2] P. Brusilovsky and E. Millán. User models for Computer Science (AIEDCS 2013), pages 60–63, 2013.
adaptive hypermedia and adaptive educational [11] I.-H. Hsiao, S. Sosnovsky, and P. Brusilovsky. Guiding
systems. In P. Brusilovsky, A. Kobsa, and W. Nejdl, students to the right questions: adaptive navigation
editors, The Adaptive Web, volume 4321 of Lecture support in an e-learning system for java programming.
Notes in Computer Science, chapter 1, pages 3–53. Journal of Computer Assisted Learning,
Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. 26(4):270–283, 2010.
[3] V. Cateté, D. Hicks, C. Lynch, and T. Barnes. [12] S. Jiang, S. M. Fitzhugh, and M. Warschauer. Social
Snag’em: Graph data mining for a social networking positioning and performance in moocs. In Workshop
game. In Workshop on Graph-Based Educational Data on Graph-Based Educational Data Mining, page 14.
Mining, volume 6, page 10. [13] T. Käser, S. Klingler, A. G. Schwing, and M. Gross.
[4] C. Conati, A. Gertner, and K. Vanlehn. Using Beyond knowledge tracing: Modeling skill topologies
bayesian networks to manage uncertainty in student with bayesian networks. In Intelligent Tutoring
modeling. User modeling and user-adapted interaction, Systems, pages 188–198. Springer, 2014.
12(4):371–417, 2002. [14] V. Kovanovic, S. Joksimovic, D. Gasevic, and
[5] R. Dekel and Y. Gal. On-line plan recognition in M. Hatala. What is the source of social capital? the
exploratory learning environments. In Workshop on association between social network position and social
Graph-Based Educational Data Mining. presence in communities of inquiry. 2014.
[6] M. C. Desmarais and M. Gagnon. Bayesian student [15] E. Millán, T. Loboda, and J. L. Pérez-de-la Cruz.
models based on item to item knowledge structures. Bayesian networks for student model engineering.
Springer, 2006. Computers & Education, 55(4):1663–1683, 2010.
Graph-based Modelling of Students’ Interaction Data from
Exploratory Learning Environments

Alexandra Poulovassilis Sergio Gutierrez-Santos Manolis Mavrikis
London Knowledge Lab London Knowledge Lab London Knowledge Lab
Birkbeck, Univ. of London Birkbeck, Univ. of London UCL Institute of Education
ap@dcs.bbk.ac.uk sergut@dcs.bbk.ac.uk m.mavrikis@lkl.ac.uk

ABSTRACT learning of algebraic generalisation [15]; and the iTalk2Learn
Students’ interaction data from learning environments has system that aims to support 8-10 year old students’ learning
an inherent temporal dimension, with successive events be- of fractions [7]. Both systems provide students with math-
ing related through the “next event” relationship. Exploratory ematical microworlds in which they undertake construction
learning environments (ELEs), in particular, can generate tasks: in MiGen creating 2-dimensional tiled models using
very large volumes of such data, making their interpretation a tool called eXpresser and in iTalk2learn creating fractions
a challenging task. Using two mathematical microworlds using the FractionsLab tool. In eXpresser, tasks typically re-
as exemplars, we illustrate how modelling students’ event- quire the construction of several models, moving from spe-
based interaction data as a graph can open up new querying cific models involving specific numeric values to a general
and analysis opportunities. We demonstrate the possibilities model involving the use of one or more variables; in parallel,
that graph-based modelling can provide for querying and students are asked to formulate algebraic rules specifying
analysing the data, enabling investigation of student-system the number of tiles of each colour that are needed to fully
interactions and leading to the improvement of future ver- colour their models. In FractionsLab, tasks require the con-
sions of the ELEs under investigation. struction, comparison and manipulation of fractions, and
students are encouraged to talk aloud about aspects of their
constructions, such as whether two fractions are equivalent.
Keywords
Exploratory Learning Environments, Interaction Data, Graph Both systems include intelligent components that provide
Modelling different levels of feedback to students, ranging from un-
solicited prompts and nudges, to low-interruption feedback
1. INTRODUCTION that students can choose to view if they wish. The aim
Much recent research has focussed on Exploratory Learn- of this feedback is to balance students’ freedom to explore
ing Environments (ELEs) which encourage students’ open- while at the same time providing sufficient support to en-
ended interaction within a knowledge domain, coupled with sure that learning is being achieved [9]. The intelligent sup-
intelligent techniques that aim to provide pedagogical sup- port is designed through detailed cognitive task analysis and
port to ensure students’ productive interaction [9]. The data Wizard-of-Oz studies [13], and it relies on meaningful indica-
gathered from students’ interactions with such ELEs pro- tors being detected as students are undertaking construction
vides a rich source of information for both pedagogical and tasks. Examples of such indicators in MiGen are ‘student
technical research, to help understand how students are us- has made a building block’ (part of a model), ‘student has
ing the ELE and how the intelligent support that it provides unlocked a number’ (i.e. has created a variable), ‘student
may be enhanced to better support students’ learning. has unlocked too many numbers for this task’; while ex-
amples of such indicators in FractionsLab are ‘student has
In this paper, we consider how modelling students’ event- created a fraction’, ‘student has changed a fraction’ (numer-
based interaction data as a graph makes possible graph- ator or denominator), ‘student has released a fraction’ (i.e.
based queries and analyses that can provide insights into has finished changing it).
the ways that students are using the affordances of the sys-
tem and the effects of system interventions on students’ be- Teacher Assistance tools can subscribe to receive real-time
haviour. Our case studies are two intelligent ELEs: the information relating to occurrences of indicators for each
MiGen system, that aims to foster 11-14 year old students’ student, and can present aspects of this information visu-
ally to the teacher [8]. Indicators are either task independent
(TI) or task dependent (TD). The former refer to aspects of
the student’s interaction that are related to the microworld
itself and do not depend on the specific task the student is
working on, while the latter require knowledge of the task
the student is working on, may relate to combinations of
student actions, and their detection requires intelligent rea-
soning to be applied (a mixture of case-based, rule-based and
probablistic techniques). Detailed discussions of MiGen’s TI
and TD indicators and how the latter are inferred may be Event EventType
found in [8]. dateTime eventID
taskID eventStatus
occurrenceOf
In this paper we explore how graph-based representation of constrID eventCat
userID
event-based interaction data arising from ELEs such as Mi- sessionID
Gen and FractionsLab can aid in the querying and analysis
of such data, with the aim of exploring both the behaviours next
of the students in undertaking the exploratory learning tasks
set and the effectiveness of the intelligent support being
provided by the system to the students. Data relating to
Figure 1: Core Graph Data Model
learning environments has often been modelled as a graph
in previous work, for example in [10] for providing support
to moderators in e-discussion environments; in [16, 18] for There is a relationship ‘next’ linking an instance of Event
supporting learning of argumentation; in [17] for modelling to the next Event that occurs for the same user, task and
data and metadata relating to episodes of work and learning session. There is a relationship ‘occurrenceOf’ linking each
in a lifelong learning setting; in [1] for learning path discov- instance of Event to an instance of the EventType class.
ery as students “navigate” through learning objects; in [3] for
recognising students’ activity planning in ELEs; and in [23] The instances of the EventType class include: startTask,
for gaining better understanding of learners’ interactions and endTask, numberCreated, numberUnlocked, unlockedNum-
ties in professional networks. berChanged, buildingBlockMade, correctModelRuleCreated,
incorrectModelRuleCreated, interventionGenerated, interven-
Previous work that is close to ours is the work on interac- tionShown, in the case of eXpresser (see [8] for the full list);
tion networks and hint generation [6, 21, 20, 4, 5], in which and startTask, endTask, fractionCreated, fractionChanged,
the graphs used consist of nodes representing states within a fractionReleased, inverventionShown, in the case of Frac-
problem-solving space and edges representing students’ ac- tionsLab.
tions in transitioning between states. This approach targets
learning environments where students are required to select We see that instances of the EventType class have several
and apply rules, and the interaction network aims to rep- attributes, including:
resent concisely information relating to students’ problem-
solving sequences in moving from state to state. Our focus
here differs from this in that we are using graphs to model • eventID: a unique numerical identifier for each type of
fine-grained event-based interaction data arising from ELEs. indicator;
In our graphs, nodes are used to represent indicator occur-
rences (i.e. events, not problem states) and edges between • eventStatus: this may be -1, 0 or 1, respectively stat-
such nodes represent the “next event” relationship. Also, ing that an occurrence of this type of indicator shows
rather than using the information derived from querying and that the student is making negative, neutral or posi-
analysing this data to automatically generate hints, our fo- tive progress towards achieving the task goals; an addi-
cus is on investigating how students are using the system tional status 2 is used for indicators relating to system
and the effects of the system’s interventions in order to un- interventions;
derstand how students interact with the ELEs and improve
their future versions. • eventCat: the category into which this indicator type
falls; for example, startTask and endTask are task-
related indicators; interventionGenerated and inter-
2. GRAPH-BASED MODELLING ventionShown are system-related ones; numberCreated,
Figure 1 illustrates our Graph Data Model for ELE interac- numberUnlocked, unlockedNumberChanged are number-
tion data. We see two classes of nodes: Event — represent- related; and fractionCreated, fractionChanged, frac-
ing indicator occurrences; and EventType — representing tionReleased are fraction-related.
different indicator types. The instances of the Event class
are occurrences of indicators that are detected or generated Figure 2 shows a fragment of MiGen interaction data con-
by the system as each student undertakes a task. We see forming to this graph data model. Specifically, it relates to
that instances of Event have several attributes: dateTime: the interactions of user 5 as he/she is working on task 2
the date and time of the indicator occurrence; userID: the during session 9. The user makes three constructions during
student it relates to; sessionID: the class session that the this task (with constrIDs 1, 2 and 3). The start and end
student was participating in at the time; taskID: the taskID of the task are delimited by an occurrence of the startTask
that the student was working on; and constrID: the con- and endTask indicator type, respectively — events 23041
struction that the student was working on1 . and 33154. We see that the two events following 23041 re-
late to an intervention being generated and being shown to
1
The model in Fig. 1 focusses on the interaction data. The the student (this is likely to be because the student was in-
full data relating to ELEs such as eXpresser and Fraction- active for over a minute after starting the task); following
sLab would also include classes relating to users, tasks, ses- which, the student creates a number — event 24115.
sions and constructions; and attributes describing instances
of these classes, such as a user’s name and year-group, a
task’s name and description, a construction’s content and There are additional attributes relating to events, not shown
description, and a session’s description and duration. here for simplicity, capturing values relating to the student’s
23041 344712
startTask startTask
dateTime: occurrenceOf dateTime: occurrenceOf
20150331091524 eventID:0 20150215091741 eventID:0
taskID:2 eventStatus:0 taskID:56 eventStatus:0
eventCat:taskEv next eventCat:taskEv
constrID:1 constrID:4
userID:5 userID:5
sessionID:9 sessionID:1
intervention- intervention-
next Generated Shown
... fractionChanged fractionReleased
23921 344758
occurrenceOf eventID:6001 eventID:6002 occurrenceOf eventID:1002 eventID:1003
dateTime: eventStatus:2 eventStatus:2 dateTime: eventStatus:1 eventStatus:1
20150331091637 eventCat:systemEv eventCat:systemEv next 20150215091828 eventCat:sfractionEv eventCat:fractionEv
taskID:2 taskID:56
constrID:1 constrID:4
userID:5 userID:5
sessionID:9 occurrenceOf numberCreated sessionID:1 occurrenceOf interventionShown

eventID:1006 eventID:6002
next eventStatus:1 next occurrenceOf eventStatus:2
23923 occurrenceOf eventCat:numberEv 344759 eventCat:systemEv
dateTime: 24115 dateTime: 344760
20150331091638 dateTime: 33154 20150215091828 dateTime: 344761
taskID:2 20150331091655 taskID:56 20150215091832
constrID:1 dateTime: occurrenceOf constrID:4 dateTime:
taskID:2 20150331094453 taskID:56 20150215091833 occurrenceOf
userID:5 constrID:1 userID:5 constrID:4
sessionID:9 next userID:5
... taskID:2 endTask sessionID:1 next userID:5
taskID:56 clickButton
constrID:3 eventID:9999 constrID:4 eventID:3002
sessionID:9 next userID:5 sessionID:1 next userID:5
eventStatus:0 eventStatus:0
sessionID:9 eventCat:taskEv sessionID:1 eventCat:taskEv

Figure 2: Fragment of Graph Data Figure 3: Fragment of Graph Data

of a node would be represented by an edge and its value by
constructions and information relating to the system’s in- a literal-valued node. So, for example, the information that
terventions. For example, for event 24115, the value of the the taskID of event 23041 is 2 would be represented by an
number created, say 5; for event 23921, the feedback strat- taskID
edge 23041 −−−−−→ 2. The query examples in the next section
egy used by the system to generate this intervention, say assume this “classical” graph representation.
strategy 8; and for event 23923, the content of the message
displayed to the user, say “How many green tiles do you need
to make your pattern?” and whether this is a high-level in-
3. GRAPH QUERIES AND ANALYSES
terruption by the system or a low-level interruption that Because the sub-graph induced by edges labelled ‘next’ con-
the student can choose to view or not. Such information sists of a set of paths, the data readily lends itself to explo-
can be captured through additional edges outgoing from an ration using conjunctive regular path (CRP) queries [2]. A
value
event instance to a literal-valued node: 24115 −−−→ 5, 23921 CRP query, Q, consisting of n conjuncts is of the form
strategy message
−−−−−−→ 8, 23932 −−−−−− → “How many green tiles do you need (Z1 , . . . , Zm ) ← (X1 , R1 , Y1 ), . . . , (Xn , Rn , Yn )
level
to make your pattern?”, 23932 −−−→ “high”. Since graph data
where each Xi and Yi is a variable or a constant, each Zi is
models are semi-structured (and graph data therefore does
a variable that appears also in the right hand side of Q, and
not need to strictly conform to a single schema), this kind
each Ri is a regular expression over the set of edge labels.
of heterogeneity in the data is readily accommodated.
In this context, a regular expression, R, has the following
syntax:
Figure 3 similarly shows a fragment of FractionsLab inter-
action data, relating to the interactions of user 5 working R := | a | | (R1 .R2 ) | (R1 |R2 ) | R∗ | R+
on task 56 during session 1. The user makes one construc-
where denotes the empty string, a denotes an edge label,
tion during this task. We see events relating to the stu-
denotes the disjunction of all edge labels, and the operators
dent changing and ‘releasing’ a fraction. Following which
have their usual meaning. The answer to a CRP query on a
the system displays a message (in this case, it was a high-
graph G is obtained by finding for each 1 ≤ i ≤ n a binary
interruption message of encouragement “Great! Well Done”).
relation ri over the scheme (Xi , Yi ), where there is a tuple
(x, y) in ri if and only if there is a path from x to y in G
We see from Figures 2 and 3 that the sub-graph induced by
such that: x = Xi if Xi is a constant; y = Yi if Yi is a
edges labelled ‘next’ consists of a set of paths, one path for
constant; and the concatenation of the edge labels in the
each task undertaken by a specific user in a specific session.
path satisfies the regular expression Ri . The answer is then
The entire graph is a DAG (directed acyclic graph): there
given by forming the natural join of the binary relations
are no cycles induced by the edges labelled ‘next’ since each
r1 , . . . , rn and finally projecting on Z1 , . . . , Zm .
links an earlier indicator occurrence to a later one; while
the instances of EventType and other literal-valued nodes
To illustrate, the following CRP query returns pairs of events
can have only incoming edges. The entire graph is also a
x, y such that x is an intervention message shown to the user
bipartite graph, with the two parts comprising (i) the in-
by the system and y indicates that the user’s next action –
stances of Event, and (ii) the instances of EventType and
in eXpresser – was to create a number (note, variables in
the literal-valued nodes.
queries are distinguished by an initial question mark):
As a final observation, we note that Figures 1 – 3 adopt
a “property graph” notation (e.g. as used in the Neo4J (?X,?Y) <- (?X,occurrenceOf,interventionShown),
graph database, neo4j.com) in which nodes may have at- (?X,next,?Y),
tributes. In a “classical” graph data model, each attribute (?Y,occurrenceOf,numberCreated)
The result would contain pairs such as (23923,24115) from (?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated),
Figure 2, demonstrating that there are indeed situations (?X,constrID,?C), (?X,next+,?Y),
where an intervention message displayed by the MiGen sys- (?Y,constrlID,?C), (?Y,occurrenceOf,?Z)
tem leads directly to the creation of a number by the student.

The following query returns pairs of events x, y such that The result would contain triples such as
that x is an intervention message shown to the user by the (23921, 23923, interventionShown),
system and y is the user’s next action; the type of y is also (23921, 24115, numberCreated),
returned, through the variable ?Z: (23921, 24136, numberUnlocked),
(23921, 24189, unlockedNumberChanged),
relating to construction 1 made by user 5 during session 9
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionShown), for task 2 (two more events — 24136 and 24189 — relat-
(?X,next,?Y), ing to construction 1 have been assumed here, in addition
(?Y,occurrenceOf,?Z) to 23923 amd 24115 shown in Figure 2, for illustrative pur-
poses). The results would not contain (23921,33154,end-
The result would contain triples such as (23923,24115,num- Task), since event 33154 relates to construction 3.
berCreated) from Figure 2 and (344760,344761,clickButton)
from Figure 3, allowing researchers to see what types of To show more clearly the answers to the previous query in
events directly follow the display of an intervention mes- the form of possible event paths, we can use extended regular
sage. This would allow the confirmation or contradiction of path (ERP) queries [11], in which a regular expression can
researchers’ expectations regarding the immediate effect of be associated with a path variable and path variables can
intervention messages on students’ behaviours. appear in the left-hand-side of queries. Thus, for example,
the following query returns the possible paths from x to y:
Focussing for the rest of this section on the data in Figure 2,
the following query returns pairs of events x, y such that x is
(?X,?P,?Y,?Z) <-
any type of event and y indicates that the user’s next action
(?X,occurrenceOf,interventionGenerated),
was to unlock a number; the type of x is also returned,
(?X,constrID,?C), (?X,next+:?P,?Y),
through the variable ?Z:
(?Y,constrID,?C), (?Y,occurrenceOf,?Z)

(?X,?Y,?Z) <- (?X,occurrenceOf,?Z),
(?X,next,?Y), The result would contain answers such as
(?Y,occurrenceOf,numberUnlocked) (23921, [next], 23923, interventionShown),
(23921, [next, 23923, next], 24115, numberCreated),
(23921, [next, 23923, next, 24115, next], 24136, numberUn-
The result would allow researchers to see what types of
locked),
events immediately precede the unlocking of a number (i.e.
(23921, [next, 23923, next, 24115, next, 24136, next], 24189,
the creation of a variable). This would allow confirmation
unlockedNumberChanged).
of researchers’ expectations about the design of the MiGen
system’s intelligent support in guiding students towards gen-
The use of the regular expressions next and next+ in the
eralising their models by changing a fixed number to an ‘un-
previous queries matches precisely one edge labelled ‘next’,
locked’ one.
or any number of such edges (greater than or equal to 1),
respectively. However, for finer control and ranking of query
The following query returns pairs of events x, y such that
answers, it is possible to use approximate answering of CRP
that x is an intervention generated by the system and y is
and ERP queries (see [11, 17]), in which edit operations such
any subsequent event linked to x through a path comprising
as insertion, deletion or substitution of an edge label can be
one or more ‘next’ edges; the type of y is also returned,
applied to regular expressions.
through the variable ?Z:
For example, using the techniques described in [11, 17], the
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated), user can chose to allow the insertion of the label ‘next’ into
(?X,next+,?Y), a regular expression, at an edit cost of 1. Submitting then
(?Y,occurrenceOf,?Z) this query:

The result would contain triples such as (23921, 23923, inter- (?X,?P,?Y,?Z) <-
ventionShown), (23921, 24115, numberCreated), ... (23921, (?X,occurrenceOf,interventionGenerated),
33154, endTask), allowing researchers to see what types of (?X,constrID,?C), APPROX(?X,next:?P,?Y),
events directly or indirectly follow the display of an interven- (?Y,constrID,?C), (?Y,occurrenceOf,?Z)
tion message by the system. This would allow the confirma-
tion or contradiction of researchers’ expectations regarding
the longer-term effect of intervention messages on students’ would return first exact answers, such as
behaviours. (23921, [next], 23923, interventionShown). The regular ex-
pression next in the conjunct APPROX(?X,next:?P,?Y) would
We can modify the query to retain only pairs x, y that relate then be automatically approximated to next.next, leading
to the same construction: to answers such as
Transition Matrix(Session 3)

(23921, [next, 23923, next], 24115, numberCreated) 6003 e s
6002 1001
at an edit distance of 1 from the original query. Following 6001 1002

this, the regular expression next.next would be automati- 5004 1003

cally approximated to next.next.next, leading to answers 5003 1004

such as 5002 1005
(23921, [next, 23923, next, 24115, next], 24136, numberUn-
locked) 5001 1006

at distance 2. This incremental return of paths of increas- 3009 1007

ing length can continue for as long as the user wishes, and 3008 1008

allows researchers to examine increasingly longer-term ef- 3007 1009

fects of intervention messages on students’ behaviours. It 3006 1010
3002 1011
would also be possible for users to specify from the outset a 1015 1014

minimum and maximum edit distance to be used in approx- 0.96342
0.00024384

imating and evaluating the query, for example to request
paths encompassing between 2 and 4 edges labelled ‘next’.
Figure 4: Transitions between Event Types
Queries based on evaluating regular expressions over a graph-
based representation of interaction data, such as those above,
can aid in the exploration of students’ behaviours as they are applied across a whole dataset, or focussing on partic-
undertaking tasks using ELEs and the effectiveness of the ular students, tasks or sessions;
intelligent support being provided by the ELE. The query
• nodes that have a high probability of being visited on a
processing techniques employed are based on incremental
randomly chosen shortest path between two randomly
query evaluation algorithms which run in polynomial time
chosen nodes have high betweenness centrality; deter-
with respect to the size of the database graph and the size
mining this measure for pairs of event type nodes (ig-
of the query and which return answers in order of increasing
noring the directionality of the ‘occurrenceOf’ edges)
edit distance [11]. A recent paper [19] gives details of an
would identify event types that play key mediating
implementation, which is based on the construction of an
roles between other event types.
automaton (NFA) for each query conjunct, the incremental
construction of a weighted product automaton from each
conjunct’s automaton and the data graph, and the use of We have already undertaken some ad hoc analyses of in-
a ranked join to combine answers being incrementally pro- teraction data arising from classroom sessions using ELEs.
duced from the evaluation of each conjunct. The paper also For example, Figure 4 shows the normalised incoming tran-
presents a performance study undertaken on two data sets sitions for a 1-hour classroom session involving 22 students
— lifelong learning data and metadata [17] and YAGO [22]. using MiGen (in the diagram, s denotes the ‘startTask’ and
The first of these has rather ‘linear’ data, similar to the in- e the ‘endTask’ event types). Event types with an adja-
teraction data discussed here, while the second has ‘bushier’ cent circle show transitions where this type of event occurs
connectivity. Query performance is generally better for the repeatedly in succession. The thickness of each arrow or
former than the latter, and the paper discusses several pos- circle indicates the value of the transition probability: the
sible approaches towards query optimisation. thicker the line, the higher the probability. Red (light grey)
is used for probabilities < 0.2 and black for probabilities
In addition to evaluating queries over the interaction data, ≥ 0.2. We can observe a black arrow 3007 → 1005, indicat-
by representing the data in the form of a graph it is possible ing transitions from events of type 3007 (detection by the
to apply graph structure analyses such as the following: system that the student has made an implausible building
block for this task) to events of type 1005 (modification of a
• path finding and clustering: this would be useful for rule by the student). Such an observation raises a hypoth-
determining patterns of interest across a whole dataset, esis for more detailed analysis or further student observa-
or focussing on particular students, tasks or sessions tion, namely: “does the construction of an incorrect building
c.f. [4]; block lead students to self-correct their rules?”. Developing
a better understanding of such complex interaction can lead
• average path length: this would be useful for determin- to improvement of the system. For this particular example,
ing the amount of student activity (i.e. the number of we designed a new prompt that suggests to students to first
indicator occurrences being generated per task) across consider the building block against the given task before
a whole dataset, or focussing on particular students, proceeding unnecessarily in correcting their rules. More ex-
tasks or sessions; amples of such ad hoc analyses are given in [14]. Represent-
ing the interaction data in graph form will allow more sys-
• graph diameter: to determine the greatest distance be- tematic, flexible and scalable application of graph-structure
tween any two nodes (which, due to the nature of the algorithms such as those identified above.
data, would be event type nodes); this would be an in-
dication the most long-running and/or most intensive
task(s); 4. CONCLUSIONS AND FUTURE WORK
We have presented a graph model for representing event-
• degree centrality: determining the in-degree centrality based interaction data arising from Exploratory Learning
of event type nodes would identify key event types oc- Environments, drawing on the data generated when students
curring in students’ interactions; this analysis could be undertake exploratory learning tasks with the eXpresser and
FractionsLab microworlds. Although developed in the con- networks: Generating high level hints based on
text of these systems, the model is a very general one and network community clustering. EDM, 2012.
can easily be used or extended to model similar data from [7] B. Grawemeyer and et al. Light-bulb moment?:
other ELEs. Towards adaptive presentation of feedback based on
students’ affective state. IUI, pages 400–404, 2015.
We have explored the possibilities that evaluating regular [8] S. Gutierrez-Santos, E. Geraniou, D. Pearce-Lazard,
path queries over this graph-based representation might pro- and A. Poulovassilis. Design of Teacher Assistance
vide for exploring the behaviours of students as they are Tools in an Exploratory Learning Environment for
working in the ELE and the effectiveness of the intelligent Algebraic Generalization. IEEE Trans. Learn. Tech.,
support that it provides to them. We have also identified 5(4):366–376, 2012.
additional graph algorithms that may yield further insights [9] S. Gutierrez-Santos, M. Mavrikis, and G. D. Magoulas.
about learners, tasks and significant indicators. A Separation of Concerns for Engineering Intelligent
Support for Exploratory Learning Environments. J.
Planned worked includes transformation and uploading of Research and Practice in Inf. Tech., 44:347–360, 2013.
the interaction data sets gathered during trials and full class- [10] A. Harrer, R. Hever, and S. Ziebarth. Empowering
room sessions of the two systems into an industrial-strength researchers to detect interaction patterns in
graph database such as Neo4J, following the graph model e-collaboration. Frontiers in Artificial Intelligence and
presented in Section 2; followed by the design, implemen- Applications, 158:503, 2007.
tation and evaluation of meaningful queries, analyses and
[11] C. Hurtado, A. Poulovassilis, and P. Wood. Finding
visualisations over the graph data, building on the work
top-k approximate answers to path queries. FQAS,
presented in Section 3. Equipped with an appropriate user
pages 465–476, 2009.
interface, educational researchers, designers or even teach-
ers with less technical expertise could in this way explore [12] K. Koedinger and et al. A data repository for the
the data from their perspective. This has the potential to EDM community: The PSLC datashop. Handbook of
lead to an improved understanding of interaction in this Educational Data Mining, 43, 2010.
context and to feed back to the design of the ELEs. We [13] M. Mavrikis and S. Gutierrez-Santos. Not all Wizards
see this approach very much in the spirit of “polyglot per- are from Oz: Iterative design of intelligent learning
sistence” (i.e. using different data storage methods to ad- environments by communication capacity tapering.
dress different data manipulation problems), and hence be- Computers and Education, 54(3):641–651, 2010.
ing used in conjunction with other EDM resources such as [14] M. Mavrikis, Z. Zheng, S. Gutierrez-Santos, and
DataShop [12]. Another direction of research is investigation A. Poulovassilis. Visualisation and analysis of
of how the flexible querying processing techniques for graph students’ interaction data in exploratory learning
data (including both query approximation and query relax- environments. Workshop on Web-Based Technology
ation) that have been developed in the context of querying for Training and Education (at WWW), 2015.
lifelong learners’ data and metadata [11, 17] might be ap- [15] R. Noss and et al. The design of a system to support
plied or adapted to the much finer-granularity interaction exploratory learning of algebraic generalisation.
data described here and the more challenging pedagogical Computers and Education, 59(1):63–82, 2012.
setting of providing effective intelligent support to learners [16] N. Pinkwart and et al. Graph grammars: An ITS
undertaking exploratory tasks in ELEs. technology for diagram representations. FLAIRS,
pages 433–438, 2008.
Acknowledgments [17] A. Poulovassilis, P. Selmer, and P. Wood. Flexible
querying of lifelong learner metadata. IEEE Trans.
This work has been funded by the ESRC/EPSRC MiGen
Learn. Tech., 5(2):117–129, 2012.
project, the EU FP7 projects iTalk2Learn (#318051) and
M C Squared (#610467). We thank all the members of [18] O. Scheuer and B. McLaren. CASE: A configurable
these projects for their help and insights. argumentation support engine. IEEE Trans. Learn.
Tech., 6(2):144–157, 2013.
[19] P. Selmer, A. Poulovassilis, and W. P.T. Implementing
5. REFERENCES flexible operators for regular path queries. GraphQ
[1] N. Belacel, G. Durand, and F. LaPlante. A binary (EDBT/ICDT Workshops), pages 149–156, 2015.
integer programming model for global optimization of [20] V. Sheshadri, C. Lynch, and T. Barnes. InVis: An
learning path discovery. G-EDM, 2014. EDM tool for graphical rendering and analysis of
[2] D. Calvanese and et al. Containment of conjunctive student interaction data. G-EDM, 2014.
regular path queries with inverse. KRR, pages [21] J. Stamper, M. Eagle, T. Barnes, and M. Croy.
176–185, 2015. Experimental evaluation of automatic hint generation
[3] R. Dekel and K. Gal. On-line plan recognition in for a logic tutor. Artificial Intelligence in Education,
exploratory learning environments. G-EDM, 2014. pages 345–352, 2011.
[4] M. Eagle and T. Barnes. Exploring differences in [22] F. Suchanek, G. Kasneci, and G. Weikum. YAGO: a
problem solving with data-driven approach maps. core of semantic knowledge. WWW, 2007.
EDM, 2014. [23] D. Suthers. From contingencies to network-level
[5] M. Eagle, D. Hicks, B. Peddycord III, and T. Barnes. phenomena: Multilevel analysis of activity and actors
Exploring networks of problem-solving interactions. in heterogeneous networked learning environments.
LAK, pages 21–30, 2015. LAK, 2015.
[6] M. Eagle, M. Johnson, and T. Barnes. Interaction
Exploring the function of discussion forums in MOOCs:
comparing data mining and graph-based approaches
Lorenzo Vigentini Andrew Clayphan
Learning & Teaching Unit Learning & Teaching Unit
UNSW Australia, UNSW Australia,
Lev 4 Mathews, Kensington 2065 Lev 4 Mathews, Kensington 2065
+61 (2) 9385 6226 +61 (2) 9385 6226
l.vigentini@unsw.edu.au a.clayphan@unsw.edu.au

ABSTRACT learning environments [6]. These can be deployed in a variety of
In this paper we present an analysis (in progress) of a dataset ways, ranging from a tangential support resource which students
containing forum exchanges from three different MOOCs. The can refer to when they need help, to a space for learning with
forum data is enhanced because together with the exchanges and others, driven by the activities students have to carry out (usually
the full text, we have a description of the design and pedagogical sharing work and eliciting feedback). The latter, in a sense,
function of forums in these courses and a certain level of detail emulates class-time in traditional courses providing a space for
about the users, which includes achievement, completion, and in structured discussions about the topics of the course. One could
some instances more details such as: education; employment; age; argue that like in face-to-face classes, the value of the interaction
and prior MOOC exposure. depends on the importance attributed to the forums by the
instructors. This is an interesting point to explore teachers’
Although a direct comparison between the datasets is not possible presence and the value of their input in directing such
because the nature of the participants and the courses are conversations. Mazzolini & Maddison characterize the role of the
different, what we hope to identify using graph-based techniques teacher and teacher presence in online discussion forums as
is a characterization of the patterns in the nature and development varying from being the ‘sage on the stage’, to the ‘guide on the
of communication between students and the impact of the ‘teacher side’ or even ‘the ghost in the wings’ [7]. Furthermore they argue
presence’ in the forums. With the awareness of the differences, we that the ‘ideal’ degree of visibility of the instructor in discussion
hope to demonstrate that student engagement can be directed ‘by- forums depends on the purpose of forums and their relationship to
design’ in MOOCs: teacher presence should therefore be planned assessment. There are also a number of accounts indicating that
carefully in the design of large-scale courses. students’ learning in forums is not very effective [8, 9]. However
if one looks at the data there are numerous examples indicating
Keywords that behaviours in forums are good predictors of performance in
MOOCs, Discussion forums, graph-based EDM, pedagogy. the courses using them, particularly if forum activities are
assessed [10,11,12,13]. Yet, forums in MOOCs tend to attract
only a small portion of the student activity [14]. This is setting
1. INTRODUCTION forums in MOOCs apart from ‘tutorial-type’ forums used to
In the past couple of years MOOCs (Massive Open Online support students’ learning in online or blended courses in higher
Courses) have become the center of much media hype as education. Furthermore, some argue that active engagement is not
disruptive and transformational [1, 2]. Although the focus has the only way of benefiting from discussion forums [15] and
been on a few characteristics of the MOOCS – i.e. free courses, students’ characteristics and preferences could be more important
massive numbers, massive dropouts and implicit quality than the course design in determining the way in which they take
warranted by the status of the institutions delivering these courses full advantage of online resources [16].
– a rapidly growing research interest has started to question the
effectiveness of MOOCS for learning and their pedagogies. If one
ignores entirely the philosophies of teaching driving the design
2. THE THREE MOOCS IN DETAIL
and delivery of MOOCs going from the the socio-constructivist In order to investigate the way in which students use the
(cMOOC, [4, 5]) to instructivist (xMOOC, [3]), at the practical discussion forums, we have extracted data from three MOOCS
level, instructors have to make specific choices about how to use delivered by a large, research intensive Australian university. The
the tools available to them. One of these tools is the discussion three courses are: P2P (From Particles to Planets - Physics);
forum. Forums are one of the most popular asynchronous tools to LTTO (Learning to Teach Online); and INTSE (Introduction to
support students’ communication and collaboration in web-based Systems Engineering), which are broadly characterised in the top
of Table 1. The courses were specifically designed in quite
different ways to test hypotheses about their design, delivery and
effectiveness.
In particular, P2P was designed emulating a traditional university
course in a sequential manner. All content was released on a
week-by-week basis dictating the pace of instruction. LTTO and
INTSE, instead were designed to provide a certain level of
flexibility for the students to elect their learning paths. All content
was readily available at the start, however for LTTO, the delivery in forum is 4x in magnitude compared to the other courses. Yet, if
followed a week-on-week delivery focusing on the interaction we look at the average amount of posts or comments, the patterns
with students and a selective attention to particular weekly topics are not straightforward to interpret, as the level of engagement is
(i.e. weekly feedback videos driven by the discussion forums as similar across the courses with 3 to 5 posts per student and 1 to 3
well as weekly announcements). Although announcements were comments (i.e. replies to existing posts), but with P2P showing a
used also in INTSE, the lack of weekly activities in the forums did higher level of engagement than the other courses. One possible
not impose a strong pacing. In INTSE, the forums had only a explanation is the different target group of the different courses
tangential support value and were used mainly to respond to with INTSE including a majority of professional engineers with
students’ queries and to clarify specific topics emerging from the postgraduate qualifications, P2P focusing on high school student
quizzes. Table 1 provides an overview of the different courses. and teachers, and LTTO targeting a broad base of teachers across
This also shows that the forum activity in the various courses is a different educational levels.
very small portion of all actions emerging from the logs of activity
which has been reported in the literature [9]. INTSE LTTO P2P
Teachers at all High school
Target group Engineers
3. DETAILS OF THE DATASET levels and teachers
3.1 The dataset Course length 9 weeks 8 weeks 8 weeks
The data under consideration is an export form the Coursera 54 105 63
Forums
platform. Raw forum database tables (posts, comments, tags, (14 top level) (17 top-level) (15 top-level)
votes) as well as a JSON based web clickstream were used. The Design mode All-at-once All-at-once Sequential
clickstream events consist of a key which specifies action – either
a ‘pageview’ or ‘video’ item. Forum clickstream events were Delivery mode All-at-once Staggered Staggered
identified by a common ‘/forum’ prefix. Use of forums Tangential Core activity Support
The clickstream was further classified into: browsing; profile N in forum 422 (2.1%) 1685 (9.3%) 293 (2.8%)
lookups; social interaction (looking at contributions); search;
tagging; and threads. From the classification it became evident the 1361 6361 1399
Tot posts
clickstream did not record all events, such as when a post or (avg=3.3) (avg=3.8) (avg=4.8)
comment was made, or when votes were applied. For these, 285 2728 901
Tot comments
specific database tables were used. In order to manage different (avg=0.7) (avg=1.7) (avg=3.1)
data sets and sources, a standardized schema was built, allowing Registrants 32705 28558 22466
disparate sources to feed into, but exposing a common interface to
conduct analysis over forum activities. This is shown in Figure 1. Active
60% 63% 47%
students1
Figure 1. Forum data transformation process 4.2% 4.4% 0.7%
Completing2
(0.3% D) (2.4 D) (0.2%)
Table 1. Summary of the courses under investigation. NOTE:
1. Active students are those appearing in the clickstream; 2.
Completing students achieve the pass grade or Distinction (D)
The type of activity is summarised in Figure 2. In the chart, the
five categories refer to the following: View corresponds to listing
forums, threads and viewing posts; Post is the writing of a post or
start of a new thread; Comment is a reply to an existing post;
Social refers to all actions engaging directly with other’s status
(up-vote, down-vote and looking at profiles/reputation); Engage
refers to the additional interaction with forums content (searching,
tagging, ‘watching’ or subscribing to posts or threads).
The viewing behaviour is the most prominent for both the student
and instructor groups and the figures are pretty much similar
across the board. A two-way ANOVA (2x5, role by activity) on
the percentage of distributions, shows that there is no significant
difference between students and instructors, but there is an
obvious difference between views and the other types of
behaviour (F(4,29) = 1656.3, p < .01).
If we consider the engagement over the timeline and compare the
type of activities carried out by students and instructors, Figure 3
3.2 An overview of forums activity (end of the paper) shows the patterns for the three courses. The
There are very interesting trends which require more detailed most striking pattern is that there doesn’t seem to be an obvious
examination (bottom of table 1). As expected, in LTTO the forum one. For what concerns posts and views in all the three courses
activity is larger than in the other courses and this is probably due there is a sense of synchronicity between the two groups, however
to the fact that students were asked to submit post in forums from this chart it is not possible to understand in more detail what
following the learning activities. The proportion of active students are the connections between what students and teachers do.
Instructors’ comments are slightly offset, possibly as a reaction to activity of students in the forums is organized according to a set of
students’ posts. An interesting aspect is the amount of ‘social’ commonly used quantitative metrics and a couple of measures
engagement in the P2P course that merits further analysis. borrowed from Social Network Analysis (table 2). Although this
seems to be a promising approach, there are two issues with this
4. DIRECTIONS AND OPEN QUESTIONS methodology in the MOOCs: 1) only a tiny proportion of students
From this coarse analysis it is apparent that there seem to be can be considered active and 2) it is hard to scale the instructor’s
minimal behavioural differences in the way students and evaluation. The first problem is not easily resolved and it is an
instructors interact in the different courses, however more analysis issue in the literature reviewed [17, 18]; non-posting behavior is
is required to tackle questions about the individual differences in considered as an index of disengagement, partly because this is
students’ and instructors’ patterns of interaction and their easy to measure. In principle the latter could be substituted by
interrelations. Furthermore little can be said about how the nature peer evaluation (up-vote, down-vote), but there is no easy way to
of interactions drives the development of communication and ensure consistency.
engagement. However a number of questions like the following
Indicator Type Description
remain open and unanswered: how do discussions develop over
time? How teacher presence affects the development of Messages Quantitative Number of messages written by
discussions? Is the number of forums affecting how students the student.
engage with them (i.e. causing disorientation)? Threads Quantitative Number of new threads created
by the student.
Figure 2. Distribution of forum activities by role
Words Quantitative Number of words written by
the student.
Sentences Quantitative Number of sentences written
by the student.
Reads Quantitative Number of messages read on
the forum by the student.
Time Quantitative Total time, in minutes, spent
on forum by the student.
AvgScoreMsg Qualitative Average score on the
instructor's evaluation of the
student's messages.
Centrality Social Degree centrality of the
student.
Prestige Social Degree prestige of the student.
Table 2. Possible indicators characterising forum engagement

An alternative method that can be explored is graph-based
approaches. For example, Bhattacharya et al. [19] used graph-
based techniques to explore the evolution of software and source
branching providing an insight in the process. Kruck et al. [20]
developed GSLAP, an interactive, graph‐based tool for analyzing
web site traffic based on user‐defined criteria.
Kobayashi et al. [18] used a method to quickly identify and track
the evolution of topics in large datasets using a mix of assignment
of documents to time slices and clustering to identify discussion
topics. Yang et al [21] integrated graph-based clustering to
characterize the emergence of communities and text-based
analysis to portray the nature of exchanges. In fact, students move
in the various sub-forums taking different roles or stances as they
engage with different subsets of students. As the reasons to
engage in these discussions are partly determined by different
interests, goals, and issues, it is possible to construct a social
network graph based on the post-reply-comment structure within
threads. The network generated provides a possible view of a
4.1 The DM and graph-based approaches student’s social participation within a MOOC, which may indicate
A possible way to answer the questions about the types/patterns of some detail about their values, beliefs and intentions.
behaviours, the structure and development of networks and the
growth of groups/communities over time might be using data Furthermore, Brown et al [22] have already shown the value of
mining and graph-based approaches. For example, [6] used a exploring the communities in discussion forums in MOOCs
combination of quantitative, qualitative and social network particularly for what concerns the homogeneity of performance
information about forum usage to predict students' success or but dissimilarity of motivations characterizing student hubs.
failure in a course by applying classification algorithms and
classification via clustering algorithms. In their approach the
4.2 Discussion points [6] Cristóbal Romero, Manuel-Ignacio López, Jose-María Luna,
The examples above provide evidence of the potential for using and Sebastián Ventura. 2013. Predicting students’ final
graph-based methods to obtain better insights into the process and performance from participation in on-line discussion forums.
content analysis for our dataset and to extend its applicability to Computers & Education 68 (October 2013), 458–472. DOI:
MOOCs, however there are a number of contentious points to http://dx.doi.org/10.1016/j.compedu.2013.06.009
raise which will provide opportunities for discussion. [7] Margaret Mazzolini and Sarah Maddison. 2007. When to
Firstly the number of students who are actively involved in jump in: The role of the instructor in online discussion
discussion is a very small proportion of the active participants. forums. Computers & Education 49, 2 (September 2007),
This means that the subset may not be representative at all. One 193–213.
could argue that these students are already engaged or desperately DOI:http://dx.doi.org/10.1016/j.compedu.2005.06.011
need help. Previous literature [21, 22, 23] focused on the ability [8] M.j.w. Thomas. 2002. Learning within incoherent structures:
to predict performance and on the peer effect which can emerge the space of online discussion forums. Journal of Computer
from the analysis of the graphs/social networks. Assisted Learning 18, 3 (September 2002), 351–366.
Secondly, one could question the value of the communities in DOI:http://dx.doi.org/10.1046/j.0266-4909.2002.03800.x
xMOOCs: especially when courses are designed with an [9] Daniel F.O. Onah, Jane Sinclair, and Russell Boyatt. 2014.
instructivits approach leading to mastery, by definition this is an Exploring the use of MOOC discussion forums. In
individualistic perspective focused on the testing of one’s own Proceedings of London International Conference on
skills/learning. Of course in cMOOCs -connectivists by design- Education. London: LICE, 1–4.
the importance of the development of social support is essential. [10] Alstete, J.W. and Beutell, N.J. Performance indicators in
This seems to be supported by Brown et al [22]: they were not online distance learning courses: a study of management
able to uncover a direct relation between stated goals and education. Quality Assurance in Education 12, 1 (2004), 6–
motivations with the participation in forums, and attributed this to 14.
pragmatic needs. However, as the authors suggested earlier, the
instructors might play a fundamental role in shaping the [11] Cheng, C.K., Paré, D.E., Collimore, L.-M., and Joordens, S.
communities based on the value attributed to forums in their Assessing the effectiveness of a voluntary online discussion
plans/design and the level of engagement/interaction. Considering forum on improving students’ course performance.
the split between cMOOCs and xMOOCs again, interesting work Computers & Education 56, 1 (2011), 253–261.
might come out of the experiment conducted by Rose’ and [12] Palmer, S., Holt, D., and Bray, S. Does the discussion help?
colleagues in the DALMOOC in which automated agents were The impact of a formally assessed online discussion on final
deployed to support students’ conversations. In Coursera the student results. British Journal of Educational Technology
deployment of ‘community mentors’ will be an interesting space 39, 5 (2008), 847–858.
to explore, given that the importance of design seems to be
[13] Patel, J. and Aghayere, A. Students’ Perspective on the
removed from instructors in the ‘on-demand’ model.
Impact of a Web-based Discussion Forum on Student
Lastly, more research is needed in the time-based dimension of Learning. Frontiers in Education Conference, 36th Annual,
development of forums in MOOCs. Questions like how students (2006), 26–31.
bond and create stable relations, how they become authoritative
[14] Jacqueline Aundree Baxter and Jo Haycock. 2014. Roles and
and what motivates them to contribute over time are all open
student identities in online large course forums: Implications
questions which the analysis of graphs over time might be able to
for practice. The International Review of Research in Open
address.
and Distributed Learning 15, 1 (January 2014).

5. REFERENCES [15] Vanessa Paz Dennen. 2008. Pedagogical lurking: Student
[1] Dirk Jan van den Berg and Edward Crawley. Why MOOCS engagement in non-posting discussion behavior. Computers
Are Transforming the Face of Higher Education. Retrieved in Human Behavior 24, 4 (July 2008), 1624–1633.
April 12, 2015 from http://www.huffingtonpost.co.uk/dirk- DOI:http://dx.doi.org/10.1016/j.chb.2007.06.003
jan-van-den-berg/why-moocs-are- [16] René F. Kizilcec, Chris Piech, and Emily Schneider. 2013.
transforming_b_4116819.html Deconstructing Disengagement: Analyzing Learner
[2] Chris Parr. The evolution of Moocs. Retrieved April 12, Subpopulations in Massive Open Online Courses. In
2015 from Proceedings of the Third International Conference on
http://www.timeshighereducation.co.uk/comment/opinion/th Learning Analytics and Knowledge. LAK ’13. New York,
e-evolution-of-moocs/2015614.article NY, USA: ACM, 170–179.
DOI:http://dx.doi.org/10.1145/2460296.2460330
[3] C. Osvaldo Rodriguez. 2012. MOOCs and the AI-Stanford
Like Courses: Two Successful and Distinct Course Formats [17] Dennen, V.P. Pedagogical lurking: Student engagement in
for Massive Open Online Courses. European Journal of non-posting discussion behavior. Computers in Human
Open, Distance and E-Learning (January 2012). Behavior 24, 4 (2008), 1624–1633.

[4] George Siemens. 2005. Connectivism: A learning theory for [18] Kobayashi, M. and Yung, R. Tracking Topic Evolution in
the digital age. International journal of instructional On-Line Postings: 2006 IBM Innovation Jam Data. In T.
technology and distance learning 2, 1 (2005), 3–10. Washio, E. Suzuki, K.M. Ting and A. Inokuchi, eds.,
Advances in Knowledge Discovery and Data Mining.
[5] Stephen Downes. 2008. Places to go: Connectivism & Springer Berlin Heidelberg, 2008, 616–625.
connective knowledge, Innovate.
[19] Bhattacharya, P., Iliofotou, M., Neamtiu, I., and Faloutsos, [22] R. Brown, C Lynch, Y. Wang, M. Eagle, J. Albert, T.
M. Graph-based Analysis and Prediction for Software Barnes, R. Baker, Y. Bergner, D. McNamara. Communities
Evolution. Proceedings of the 34th International Conference of performance and communities of preference. GEDM
on Software Engineering, IEEE Press (2012), 419–429. 2015, in press.
[20] Kruck, S.E., Teer, F., and Jr, W.A.C. GSLAP: a graph-based [23] M. Fire, G. Katz, Y. Elovici, B. Shapira, and L. Rokach,
web analysis tool. Industrial Management & Data Systems “Predicting Student Exam’s Scores by Analyzing Social
108, 2 (2008), 162–172. Network Data,” in Active Media Technology, R. Huang, A.
A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin,
[21] D. Yang, M. Wen, A. Kumar, E. P. Xing, and C. P. Rose,
Eds. Springer Berlin Heidelberg, 2012, pp. 584–595.
“Towards an integration of text and graph clustering methods
as a lens for studying social interaction in MOOCs,” The
International Review of Research in Open and Distributed
Learning, vol. 15, no. 5, Oct. 2014.

Figure 3. Time sequence of activity in the forums in the three courses by students and instructors grouped by activity type
BOTS: Selecting Next-Steps from Player Traces in a Puzzle
Game

Drew Hicks Yihuan Dong Rui Zhi
North Carolina State North Carolina State North Carolina State
University University University
911 Oval Drive 911 Oval Drive 911 Oval Drive
Raleigh, NC 27606 Raleigh, NC 27606 Raleigh, NC 27606
aghicks3@ncsu.edu ydong2@ncsu.edu rzhi@ncsu.edu
Veronica Cateté Tiffany Barnes
North Carolina State North Carolina State
University University
911 Oval Drive 911 Oval Drive
Raleigh, NC 27606 Raleigh, NC 27606
vmcatete@ncsu.edu tmbarnes@ncsu.edu

ABSTRACT model student behavior in tutors, and provide insight into
In the field of Intelligent Tutoring Systems, data-driven meth- problem-solving strategies and misconceptions. This data
ods for providing hints and feedback are becoming increas- structure can be used to provide hints, by treating the Inter-
ingly popular. One such method, Hint Factory, builds an action Network similarly to a Markov Decision Process and
interaction network out of observed player traces. This data selecting the most appropriate next step from the requesting
structure is used to select the most appropriate next step user’s current state. This method has successfully been em-
from any previously observed state, which can then be used ployed in systems in which each action a player may take is
to provide guidance to future players. However, this method of similar cost; for example in the Deep Thought logic tutor
has previously been employed in systems in which each ac- each action is an application of a particular axiom. Applying
tion a player may take requires roughly similar effort; that this method to an environment where actions are of differ-
is, the “step cost” is constant no matter what action is taken. ent costs, or outcomes are of varying value will require some
We hope to apply similar methods to an interaction network adaptations to be made. In this work, we discuss how we
built from player traces in our game, BOTS; However, each will apply Hint Factory methods to an interaction network
edge can represent a varied amount of effort on the part of built from player traces in a puzzle game, BOTS. In BOTS,
the student. Therefore, a different hint selection policy may each “Action” is the set of changes made to the program
be needed. In this paper, we discuss the problems with our between each run. Therefore, using the current hint selec-
current hint policy, assuming all edges are the same cost. tion policy would result in very high-level hints comprising
Then, we discuss potential alternative hint selection policies a great number of changes to the student’s program. Since
we have considered. this is undesirable, a different hint selection policy may be
needed.
Keywords
Hint Generation, Serious Games, Data Mining 2. DATA-DRIVEN HINTS AND FEEDBACK
In the ITS community, several methods have been proposed
1. INTRODUCTION for generating hints/feedback from previous observations of
Data-driven methods for providing hints and feedback are users’ solutions or behavior. Rivers et al propose a data-
becoming increasingly popular, and are especially useful for driven method to generate hints automatically for novice
environments with user- or procedurally-generated content. programmers based on Hint Factory[8]. They present a
One such method, Hint Factory, builds an interaction net- domain-independent algorithm, which automates hint gener-
work out of observed player traces. An Interaction Network ation. Their method relies on solution space, which utilizes
is a complex network of student-tutor interactions, used to graph to represent the solution states. In solution space,
each node represents a candidate solution and each edge rep-
resents the action used to transfer from one state to another.
Due to the existence of multiple ways to solve a program-
ming problem, the size of the solution space is huge and thus
it is impractical to use. A Canonicalizing model is used to
reduce the size of the solution space. All states are trans-
formed to canonicalized abstract syntax trees (ASTs). If the
canonical form of two different states are identical, they can
be combined together. After simplifying the solution space,
hint generation is implemented. If the current state is in-
correct and not in the solution space, the path construction
algorithm will find an optimal goal state in the solution space
which is closest to current state. This algorithm uses change
vectors to denote the change between current state and goal
state. Once a better goal state is found during enumerat-
ing all possible changes, it returns the current combination
of change vectors. Each change vector can be applied to
current state and then form an intermediate state. The in-
termediate states are measured by desirability score, which
represents the value of the state. And then the path con-
struction algorithm generates optimal next states based on
the rank of the desirability scores of all the intermediate
states. Thus a new path can be formed and added to the
solution space, and appropriate hints can be generated.

Jin et al propose linkage graph to generate hints for pro-
gramming courses[4]. Linkage graph uses nodes to represent
program statements and direct edges to indicate ordered de-
Figure 1: The BOTS interface. The robot’s program
pendencies of those statements. Jin’s approach applies ma-
is along the left side of the screen. The “toolbox” of
trix to store linkage graph for computation. To generate
available commands is along the top of the screen.
linkage matrix, first, they normalize variables in programs
by using instructor-provided variable specification file. After
variable normalization, they sort the statement with 3 steps:
(i) preprocessing, which breaks a single declaration for mul- information and check if hints are available for the state.
tiple variables (e.g. int a, b, c) into multiple declaration The action that leads to subsequent state with the highest
statements (e.g. int a; int b; int c;); (ii) creating statement value is used to generate a hint sequence. A hint sequence
sets according to variable dependencies, which put indepen- consists of four types of hints and are ordered from gen-
dent statements into first set, put statements depend only eral hint to detailed hint. Hint provider will then show hint
on statements in the first set into second set, put statements from top of the sequence to the student. Hint Factory has
depends only on statements in the first and second set into been applied in logic tutors which helps students learn logic
third set, and so on; (iii) in-set statement sorting, during proof. The result shows that the hint-generating function
which the statements are sorted in decreasing order within could provide hints over 80% of the time.
set using their variable signatures. In hint generation, they
first generate linkage graphs with a set of correct solutions, 3. BOTS
as the sources for hint generation. They also compose the
BOTS is a programming puzzle game designed to teach
intermediate steps during program development into a large
fundamental ideas of programming and problem-solving to
linkage graph, and assign a reward value to each state and
novice computer users. The goal of the BOTS project is
the correct solution. Then, they apply value iteration to
to investigate how to best use community-authored content
create a Markov Decision Process (MDP). When a student
within serious games and educational games. BOTS was
requires hint, tutor will generate a linkage graph for the par-
inspired by games like LightBot [10] and RoboRally [2], as
tial program and try to find the closest match in MDP. If
well as the success of Scratch and it’s online community [1]
a match is found in MDP, the tutor would generate hint
[5]. In BOTS, players take on the role of programmers writ-
with the next best state based on highest assigned value.
ing code to navigate a simple robot around a grid-based 3D
If a match is not found in current MDP, which means the
environment, as seen in Figure 1. The goal of each puzzle
student is taking a different approach from existing correct
is to press several switches within the environment, which
solutions, the tutor will try to modify those correct solutions
can be done by placing an object or the robot on them.
to fit student’s program and then provide hints.
To program the robots, players will use simple graphical
pseudo-code, allowing them to move the robot, repeat sets
Hint Factory[9] is an automatic hint generation technique
of commands using “for” or “while” loops, and re-use chunks
which uses Markov decision processes (MDPs) to generate
of code using functions. Within each puzzle, players’ scores
contextualized hints from past student data. It mainly con-
depend on the number of commands used, with lower scores
sists of two parts - Markov Decision Process (MDP) gener-
being preferable. In addition, each puzzle limits the maxi-
ator and hint provider. The MDP generator runs a process
mum number of commands, as well as the number of times
to generate MDP values for all states seen in previous stu-
each command can be used. For example, in the tutorial
dents’ solutions. In this process, all the students’ solutions
levels, a user may only use the “Move Forward” instruction
are combined together to form a single graph. Each node
10 times. Therefore, if a player wants to make the robot
of the graph represents a state, and each edge represents an
walk down a long hallway, it will be more efficient to use a
action one student takes to transform from current state to
loop to repeat a single “Move Forward” instruction, rather
another state. Once the graph is built, the MDP generator
than to simply use several “Move Forward” instructions one
uses Bellman backup to assign values for all nodes. After up-
after the other. These constraints are meant to encourage
dating all values, a hint file is generated. The hint provider
players to re-use code and optimize their solutions.
uses hint file to provide hint. When a student asks for a hint
at a existing state, hint provider will retrieve current state
In addition to the guided tutorial mode, BOTS also con-
tains an extensive “Free Play” mode, with a wide selection
of puzzles created by other players. The game, in line with a b
the “Flow of Inspiration” principles outlined by Alexander
Repenning [7], provides multiple ways for players to share
knowledge through authoring and modifying content. Play-
ers are able to create their own puzzles to share with their
peers, and can play and evaluate friends’ puzzles, improv-
ing on past solutions. Features such as peer-authored hints c d
for difficult puzzles, and a collaborative filtering approach
to rating are planned next steps for the game’s online ele-
ment. We hope to create an environment where players can Error
continually challenge their peers to find consistently better
solutions for increasingly difficult problems. e f
User-generated content supports replayability and a sense
of a community for a serious game. We believe that user-
created puzzles could improve interest, encouraging students
to return to the game to solidify their mastery of old skills
and potentially helping them pick up new ones. Figure 2: Several generated hints for a simple puz-
zle. The blue icon represents the robot. The ’X’ icon
4. ANALYSIS represents a goal. Shaded boxes are boxes placed on
goals, while unshaded boxes are not on goals. Hint
4.1 Dataset F in this figure depicts the start and most common
Data for the BOTS studies has come from a middle school goal state of the puzzle.
computer science enrichment program called SPARCS. In
this program, the students attend class on Saturday for 4
hours where computer science undergraduates teach them (start/intermediate) → (intermediate/goal) These transitions
about computational thinking and programming. Students occur when a student moves the robot to an intermediate
attend a total of 7 sessions, each on a different topic, ranging state rather than directly to the goal. Since players run
from security and encryption to game design. The students their programs to produce output, we speculate that these
all attend the same magnet middle school. The demograph- may represent subgoals such asmoving a box onto a specific
ics for this club are 74.2% male, 25.8% female, 36.7% African switch. After accomplishing that task, the user then ap-
American, and 23.3% Hispanic. The student’s grade distri- pends to their program, moving towards a new objective,
bution is 58% 6th grade, 36% 7th grade and 6% 8th grade. until they reach a goal state. Hint B in Figure 2 shows a
hint generated from such a transition.
From these sessions, we collected gameplay data for 20 tu-
torial puzzles as well as 13 user-created puzzles, With this
data, we created an Interaction Network in order to be able 4.2.2 Correction Transition
to provide hints and feedback for future students [3]. How- (error/mistake) → (intermediate/goal) This transition oc-
ever, using program edits as states, the interaction networks curs when a student makes and then corrects a mistake.
produced were very sparse. In order to be better able to These are especially useful because we can offer hints based
relate similar actions, we produced another interaction net- on the type of mistake. Hints D and E in Figure 2 show hints
work using program output as our state definition [6]. built from this type of transition; however, hint E shows a
case where a student resolved the mistake in a suboptimal
way.
4.2 States and Transitions
Based on the data collected, we can divide the set of ob-
served states into classes. First among these is the start state 4.2.3 Simple Solution Transition
in which the problem begins. By definition, every player’s (start) → (goal) This occurs when a student enters an entire,
path must begin at this state. Next is the set of goal states correct program, and solves the puzzle in one attempt. This
in which all buttons on the stage are pressed. These are makes such transitions not particularly useful for generating
reported by the game as correct solutions. Any complete hints, other than showing a potential solution state of the
solution, by definition, ends at one such state. Among states puzzle. Hint F in Figure 2 shows this type of transition.
which are neither start nor goal states, there are three im-
portant classifications: Intermediate states (states a robot
moves through during a correct solution), mistake states
4.2.4 Rethinking Transition
(states a robot does not move through during a correct so- (intermediate) → (intermediate/goal) This transition occurs
lution), and error states (states which result from illegal when rather than appending to the program as in a subgoal
output, like attempting to move the robot out-of-bounds). transition, the user deletes part or all of their program, then
Based on these types of states, we classified our hints based moving towards a new goal. As a result, the first state is
on the transitions they represented. unrelated to the next state the player reaches. Offering this
state as a hint would likely not help guide a different user.
Hint A in Figure 2 shows an example of this. Finding and
4.2.1 Subgoal Transition recognizing these is an important direction for future work.
4.2.5 Error Transition
(start/intermediate) → (mistake/error) This corresponds to
a program which walks the robot out of bounds, into an
object, or other similar errors. While we disregarded these
as hints, this type of transition may still be useful. In such
a case, the last legal output before the error could be a
valuable state. Hint C in Figure 2 is one such case.

4.3 Next Steps
While this approach was able to help us identify interest-
Figure 3: This subgraph shows a short-circuit where
ing transitions, as well as significantly reduce the sparseness
a player bypasses a chain of several steps.
of the Interaction Network by merging states with similar
output, we violate several assumptions of the Hint Factory
technique by using user compilation as an action. Essen- Another problem which arises from this state representation
tially, the cost of an action can vary widely. In the most is seen in Hints C and E above. These hints show states
extreme examples, the best next state selected by Hint Fac- where a student traveled from a state to a worse state before
tory will simply be the goal state. ultimately solving the problem. Since we limit our search for
hintable states to the immediate child states of s in s0 , we are
4.4 Current Hint Policy unable to escape from such a situation if the path containing
Our current hint selection policy is the same as the one used the error is the best or only observable path to the goal.
in the logic tutor Deep Thought with a few exceptions [9].
We combine all student solution paths into a single graph,
mapping identical states to one another (comparing either
4.5 Proposed Hint Policies
One potential modification of the hint policy involves ana-
the programs or the output). Then, we calculate a fitness
lyzing the programs/output on the nodes, using some dis-
value for each node. We assign a large positive value (100)
tance metric δ(s, s0 ). This measurement would be used in
to each goal state, a low value for dead-end states (0) and
addition to the state’s independent fitness value R(s) which
a step cost for each step taken (1). Setting a non-zero cost
takes into account distance from a goal, but is irrespective
on actions biases us towards shorter solutions. We then
of the distance from any previous state. For example in the
calculate fitness values V (s) for each state s, where R(s) is
short-circuit example above, using “difference in number of
the initial fitness value for the state, γ is a discount factor,
lines of code” as a distance metric we could take into ac-
and P (s, s0 ) is the observed frequency with which users in
count how far the “Goal” state is from the “Start” state, and
state s go to state s0 next, via taking the action a. The
potentially choose a nearer state as a hint. This also helps
equation for determining the fitness value of a state is as
correct for small error-correction steps in player solutions;
follows:
if the change between the current state and the target hint
state is very small, we may want to consider hinting toward
the next step instead, or a different solution path altogether.
X
V (s) := R(s) + γmax Pa (s, s0 )V (s0 ) (1)
a
s0
X
V (s) := R(s) + γmax δ(s, s0 )P (s, s0 )V (s0 ) (3)
a
However, in our current representation there is only one s0
available action from any state: “run.” Different players us-
ing this action will change their programs in different ways
One potential downside to this approach is that it requires
between runs, so it is not useful to aggregate across all the
somewhat more knowledge of the domain to be built into the
possible resulting states. Instead, we want to consider each
model. If the distance metric used is inaccurate or flawed,
resulting state on its own. As a result, we use a simplified
there may be cases where we choose a very suboptimal hint.
version of the above, essentially considering each possible
using difference in lines of code as our distance metric, the
resulting state s0 as the definite result of its own action:
change between a state where a player is using no functions
and a state where the user writes existing code into a func-
tion may be very small. Hints selected in these cases might
V (s) := R(s) + γmax
0
P (s, s0 )V (s0 ) (2) guide students away from desired outcomes in our game.
s

Another problem we need to resolve with our current hint
Since the action “run” can encompass many changes, select- policy, as discussed above, is the case where the best or
ing the s0 which maximizes the value may not always be the only path to a goal from a given state s has an error as
best choice for a hint. The difference between s and s0 can a direct child s0 . One method of resolving this could be,
be quite large, and this is usually the case when an expert instead of offering s0 as a hint, continuing to ask for next-
user solves the problem in one try, forming an edge directly step hints from s0 until some s0 is a hintable, non-error state.
between the “start” state and “goal” state. These and other This solution requires no additional knowledge of the game
“short-circuits” make it difficult to assess which of the child domain, however it’s possible that the hint produced will
nodes would be best to offer as a hint by simply using the be very far from s, or that we may skip over important
calculated fitness value. information about how to resolve the error or misconception
that led the student into state s in the first place. [4] W. Jin, T. Barnes, J. Stamper, M. J. Eagle, M. W.
Johnson, and L. Lehmann. Program representation for
Other modifications to the hint selection policy may produce automatic hint generation for a data-driven novice
better results than these. We hope to look into as many programming tutor. In Intelligent Tutoring Systems,
possible modifications as we can, seeing which modifications pages 304–309. Springer, 2012.
produce the most suitable hints on our current dataset be- [5] D. J. Malan and H. H. Leitner. Scratch for budding
fore settling on an implementation for the live version of the computer scientists. ACM SIGCSE Bulletin,
game. 39(1):223–227, 2007.
[6] B. Peddycord III, A. Hicks, and T. Barnes.
5. ACKNOWLEDGMENTS Generating hints for programming problems using
Thanks to the additional developers who have worked on this intermediate output.
project or helped with our outreach activities so far, includ- [7] A. Repenning, A. Basawapatna, and K. H. Koh.
ing Aaron Quidley, Trevor Brennan, Irena Rindos, Vincent Making university education more like middle school
Bugica, Victoria Cooper, Dustin Culler, Shaun Pickford, computer club: facilitating the flow of inspiration. In
Antoine Campbell, and Javier Olaya. This material is based Proceedings of the 14th Western Canadian Conference
upon work supported by the National Science Foundation on Computing Education, pages 9–16. ACM, 2009.
Graduate Research Fellowship under Grant No. 0900860 [8] K. Rivers and K. R. Koedinger. Automating hint
and Grant No. 1252376. generation with solution space path construction. In
Intelligent Tutoring Systems, pages 329–339. Springer,
6. REFERENCES 2014.
[1] I. F. de Kereki. Scratch: Applications in computer [9] J. Stamper, T. Barnes, L. Lehmann, and M. Croy.
science 1. In Frontiers in Education Conference, 2008. The hint factory: Automatic generation of
FIE 2008. 38th Annual, pages T3B–7. IEEE, 2008. contextualized help for existing computer aided
[2] R. Garfield. Roborally. [Board Game], 1994. instruction. In Proceedings of the 9th International
[3] A. Hicks, B. Peddycord III, and T. Barnes. Building Conference on Intelligent Tutoring Systems Young
games to learn from their players: Generating hints in Researchers Track, pages 71–78, 2008.
a serious game. In Intelligent Tutoring Systems, pages [10] D. Yaroslavski. LightBot. [Video Game], 2008.
312–317. Springer, 2014.
Semantic Graphs for Mathematics Word Problems based
on Mathematics Terminology

Rogers Jeffrey Leo John Thomas S. McTavish Rebecca J. Passonneau
Center for Computational Center for Digital Data, Center for Computational
Learning Systems Analytics & Adaptive Learning Learning Systems
Columbia University Pearson Columbia University
New York, NY, USA Austin, TX, USA New York, NY, USA
rl2689@columbia.edu tom.mctavish@pearson.com becky@ccls.columbia.edu

ABSTRACT ercises, we randomly walk a graph using the cosine distance
We present a graph-based approach to discover and extend as edge weights. We also recast the problem as a bipartite
semantic relationships found in a mathematics curriculum graph with exercises on one side and words on the other,
to more general network structures that can illuminate re- providing an edge when an exercise contains the math word.
lationships within the instructional material. Using words We contrast these two different types of random walks and
representative of a secondary level mathematics curriculum find somewhat similar results, which lends confidence to the
we identified in separate work, we constructed two similar- analysis. The bipartite graph walks, however, are more sen-
ity networks of word problems in a mathematics textbook, sitive to differences in word frequency. Casting measures of
and used analogous random walks over the two networks similarity as graphs and performing random walks on them
to discover patterns. The two graph walks provide similar affords more nuanced ways of relating objects, which can be
global views of problem similarity within and across chap- used to build more granular domain models for analysis of
ters, but are affected differently by number of math words prerequisites, instructional design, and adaptive learning.
in a problem and math word frequency.

1. INTRODUCTION 2. RELATED WORK
Curricula are compiled learning objects, typically presented Random walks over graphs have been used extensively to
in sequential order and arranged hierarchically, as in a book’s measure text similarity. Applications include similarity of
Table of Contents. Ideally, a domain model captures rela- web pages [15] and other documents [5], citations [1], pas-
tionships between the learning objects and the knowledge sages [14], person names in email [12] and so on. More re-
components or skills they exercise. Unfortunately, domain cently, general methods that link graph walks with external
models are not often granular enough for optimal learning resources like WordNet have been developed to produce a
experiences. For example, prerequisite relationships may be single system that handles semantic similarity for words,
lacking, or the knowledge components associated with an sentences or text [16]. Very little work compares walks over
exercise may be unknown. In such cases, assessments on graphs of the same content, where the graphs have differ-
those learning objects will be insufficient to enable appro- ent structure. We create two different kinds of graphs for
priate redirection unless expert (i.e. teacher) intervention mathematics word problem and compare the results. We
is explicitly given. Domain models remain coarse because find that the global results are very similar, which is good
using experts to enumerate and relate the knowledge com- evidence for the general approach, and we find differences in
ponents is costly. detail that suggest further investigation could lead to cus-
tomizable methods, depending on needs.
As a means to automatically discover relationships among
learning objects and to reveal their knowledge components, An initiative where elementary science and math tests are a
we demonstrate the use of direct similarity metrics and ran- driver for artificial intelligence has led to work on knowledge
dom graph walks to relate exercises in a mathematics cur- extraction from textbooks. Berant et al. [2] create a system
riculum. We first apply a standard cosine similarity measure to perform domain-specific deep semantic analysis of a 48
between pairs of exercises, based on bag-of-word vectors con- paragraphs from a biology textbook for question answering.
sisting of math terms that we identified in separate work Extracted relations serve as a knowledge base against which
[7]. Then, to extract less explicit relationships between ex- to answer questions, and answering a question is treated as
finding a proof. A shallow approach to knowledge extraction
from a fourth grade science curriculum is taken in [6], and
the knowledge base is extended through dialog with users
until a path in the knowledge network can be found that
supports a known answer. In the math domain, Kushman et
al. [10] generate a global representation of algebra problems
in order to solve them by extracting relations from sentences
and aligning them. Seo et al. [18] study text and diagrams
together in order to understand the diagrams better through
textual cues. We are concerned with alignment of content
Two machines produce the same type of widget. Machine high agreement. All the non-stop words were then labeled
A produces W widgets, X of which are damaged. Machine B by a trained annotator. We developed a supervised machine
produces Y widgets, Z of which are damaged. The fraction of learning approach to classify vocabulary into math and non-
X
damaged widgets for Machine A is W or (simplified fraction). math words [7] that can be applied to new mathematics cur-
Z
The fraction of damaged widgets for Machine B is Y or ricula. For the text used here, there were 577 math terms.
(simplified fraction). Write each fraction as a decimal and a
percent. Use pencil and paper. Select a small percent that
would allow for a small number of damaged widgets. Find
3.3 Random Walks in Graphs
A random walk on a graph starts at a given node and steps
the number of widgets by which each machine exceeded the
with random probability to a neighboring node. The same
acceptable number of widgets.
random decision process is employed at this and every sub-
sequent node until a termination criterion is met. Each time
Figure 1: Sample problem; math terms are in boldface. a node is visited, it is counted. Open random walks require
that the start node and end nodes differ. Traversal methods
may employ a bias to navigate toward or away from certain
across rather than within problems, and our objective is neighbors through edge weights or other graph attributes.
finer-grained analysis of curricula.
In a graph, G = (V, E) with nodes V and edges E, a ran-
Other work that addresses knowledge representation from dom walk that begins at vx and ends at vy can be denoted as
text includes ontology learning [3], which often focuses on (vx , ..., vy ). By performing several random walks, the frac-
the acquisition of sets of facts from text [4]. There has tion of times the node vy is visited converges to the prob-
been some work on linking lexical resources like WordNet ability of target vy being visited given the start node vx ,
or FrameNet to formal ontologies [17, 13], which could pro- which can be expressed as P (vy |vx ) under the conditions of
vide a foundation for reasoning over facts extracted from the walk. In the case of a random walk length of 1, P (vy |vx )
text. We find one work that applies relation mining to e- will simply measure the probability of vy being selected as
learning: Šimko and Bieliková [19] apply automated relation an adjacent node to vx .
mining to extract relations to support e-course authoring in
the domain of teaching functional programming. Li et al.
[11] apply k-means clustering to a combination of problem 3.4 Cosine Similarity Graph
features and student performance features, and propose the Math exercises are represented as bag-of-words vectors with
clusters correspond to Knowledge Components [8]. boolean values to indicate whether a given math term is
present. Cosine similarity quantifies the angle between the
two vectors, and is given by the dot product of two vectors.
3. METHODS Pn
3.1 Data te i=1 ti ei
cos(t, e) = = pPn pPn (1)
We used 1800 exercises from 17 chapters of a Grade 7 mathe- ktkkek i=1 (ti )2 i=1 (ei )
2

matics curriculum. Most are word problems, as illustrated in
Similarity values of 1 indicate that both the vectors are the
Figure 1. They can incorporate images, tables, and graphs,
same whereas a value of zero indicates orthogonality between
but for our analysis, we use only the text. The vocabu-
the two vectors. Pairwise cosine similarities for all 1800
lary of the resulting text consists of 3,500 distinct words.
exercises were computed, yielding a cosine similarity matrix
We construct graphs where math exercises are the nodes, or
Mcos . The matrix corresponds to a graph where non-zero
in a bipartite graph, math exercises are the left side nodes
cosine similarities are edge weights between exercises.
and words are the right side nodes. Our initial focus is on
exercise similarity due to similarity of the math skills that
In a graph walk, the probability that a node vy will be
exercises tap into, and we use mathematics terminology as
reached in one step from a node vx is given by the prod-
an indirect proxy of skills a problem draws upon.
uct of the degree centrality of vx and the normalized edge
weight (vx , vy ). With each exercise as a starting node, we
3.2 Math Terminology performed 100,000 random walks on the cosine-similarity
The text of the word problems includes ordinary language graph, stepping with proportional probability to all outgo-
expressions unrelated to the mathematics curriculum, such ing cosine similarity weights. To measure 2nd degrees of
as the nouns machines, widgets shown in problem in Fig- separation, with each walk we made two steps.
ure 1, or the verbs produces, damaged. For our purposes,
mathematics terminology consists of words that expresses For two math vectors considered as the sets A and B, cosine
concepts that are needed for the mathematical competence similarity can be conceptualized in terms of the intersection
the curriculum addresses. To identify these terms, we devel- set C = A ∪ B and set differences A \ B and B \ A. Cosine
oped annotation guidelines for human annotators who label similarity is high when |C| A \ B and |C| B \ A.
words in their contexts of use, and assessed the reliability of
annotation by these guidelines. Words can be used in the The degree of a node affects the probability of traversing
math texts sometimes in a math sense and sometimes in a any edge from that node. The two factors that affect de-
non-math sense. Annotators were instructed to label terms gree centrality of a start node are the document frequencies
based on the most frequent usage. of its math words, and the total number of math words.
Here, document frequency (df) is the normalized number of
Using a chance-adjusted agreement coefficient in [-1,1] [9], exercises a word occurs in. A high df math word in a prob-
reliability among three annotators was 0.81, representing lem increase its degree centrality because there will be more
Figure 2: Exercise-to-Exercise similarity in Chapter 6. The exercises of Chapter 6 are displayed in row-columns in a square
matrix. The rows represent source nodes and columns represent targets. Each row has been normalized across the book, even
though only Chapter 6 is shown. The axes demarcate the sections of the chapter. Mcos is the cosine similarity. Mcosrw is the
output from the random walk using cosine similarity edge weights, Mbp is the output from the random bipartite walk. Raw
values displayed between 0 and 0.005 corresponding to light and dark pixels, respectively.

problems it can share words with, resulting in non-zero co- the distribution across the row as a probability distribution
sine values and therefore edges. The number of math words to all other nodes for that source node.
in a problem also increases its degree centrality.
4. RESULTS
3.5 Bipartite exercise and word graph We compare the three measures of similarity between ex-
The set of exercises Ve are the left-side nodes and the math ercises: 1) cosine similarity, 2) random walks using cosine
words Vw are the right-side nodes in the undirected bipartite similarity as edge weights, and 3) random walks along a bi-
graph G = (Ve , Vw , E), where an edge exists between vex and partite graph of exercises and words.
vwi if exercise x contains the math word i.

We performed open random walks on this graph to measure
4.1 Exercise-to-Exercise Similarity
We describe exercise-to-exercise similarity with square ma-
similarity between nodes. To measure the similarity of ex-
trices where each exercise is represented as a row-column. A
ercises, we walk in even steps – a step to a connected word
number of features of the measures are embedded in Figure
followed by a step back to one of the exercises that shares
2, which shows heatmaps of color values for pairs of exercises
that word. The degrees of separation between vertices on
in chapter 6 for each matrix. We find that within chapters
the same side of the graph (e.g. exercise-to-exercise) will be
and especially within sections of those chapters, there is a
l/2 where l is the length of the walk. In this paper, we ex-
high degree of similarity between exercises regardless of the
plored first and second degrees of separation so our bipartite
measure. This demonstrates that words within sections and
graphs had a walk length of 4.
chapters share a common vocabulary. We can see that Mcos
Table 1: Summary statistics of the similarity distributions has more extreme values than Mcosrw ; as explained below,
it has both more zero cosine values, and more very high
cosine rwcos rwbp values. This is most likely because Mcosrw , from doing the
minimum 0 0 0 walk, picks up exercises that are another degree of separa-
maximum 6.3 ×10−2 0.50 0.11 tion away. When the row of the matrix is normalized to
mean 5.55 ×10−4 5.55 ×10−4 5.55 ×10−4 capture the distribution of the source node, the otherwise
median 0 2.41 ×10−4 2.06 ×10−4 high values from Mcos are tempered in the Mcosrw matrix.
std. dev. 1.19 ×10−3 8.57 ×10−4 1.24 ×10−3 This shift to a large number of lower scores is shown in the
bottom panel of Figure 3. Mbp and Mcosrw are very similar,
Because exercise nodes are connected via word nodes, we in- but Mbp generally has a wider dynamic range.
terpret the fraction of node visits as a similarity measure be-
tween the source node and any node visited. We performed 4.2 Comparison of the Graph Walks
100,000 random walks from each node. Exercise-to-exercise Table 1 provides summary statistics for cosine similarity and
similarity can be visualzed as square matrices with source the two random walks for all pairs of problems (N=3,250,809).
nodes in the rows and target nodes in the columns. To fac- The cosine matrix is very sparse, as shown by the median
tor out the times a source may have been selected as one of value of 0. Of the two random walk similarities, rwcos has
the targets, we set the diagonal of the matrix to zero. We a lower standard deviation around the mean, but otherwise
then normalized across the rows so that we could interpret the two random walks produce similar distributions.
The similarity values given by cosine and the cosine random Table 3: Two pairs of problems with high cosine similarity
walk will increasingly differ the more that the start problem and reverse patterns of graph walk similarity. The first pair,
has relatively higher degree centrality due either to more from section 5, have lower than average rwcos and higher
words or higher frequency of words in exercises (df). For than average rwbp due to relatively many words in common
reference, the word that occurs most frequently, number, (12 and 14). The second pair, from section 6, have higher
has a df of 0.42, and the second most frequent occurs in than average rwcos and lower than average rwbp due to rel-
only 15% of the exercises. Fifty eight nodes have no edges atively few words in common.
(0 degree), the most frequent number of edges is 170, and
the maximum is 1,706. Table 2 gives the summary statistics Prob 1 Prob 2 cosine rwcos rwbp N max df
for df, number of math words, and degree centrality. 6.5.99 6.5.85 1.0000 0.0032 0.0102 12 0.42
6.5.94 6.5.83 0.8819 0.0026 0.0064 14 0.13
Inspection of the data shows that for pairs of problems in 6.6.109 6.6.102 0.8660 0.0068 0.0037 3 0.11
the two chapters for our case study, if the cosine similar- 6.6.104 6.6.102 0.7746 0.0068 0.0029 3 0.11
ity between a pair is high (≥ 0.75), the similarity values
for rwcos tend to go down as the number of shared word
increases from 3 to between 5 and 7. For the rwbp , the cosine similarity is less than 0.20, the two walks produce
opposite trend occurs, where the similarity goes up as the very similar results. The average similarity values for the
number of words increases. This difference helps account for bipartite walk are about 20% higher, and the maximum val-
an observed divergence in the two graph walks for sections ues are higher, but the two walks produce similar means,
5 and 6 of Chapter 6. independent of the lengths of the common word vectors, or
the total number of math words.
Table 3 illustrates two pairs of problems from section 5 that
have high cosine similarities, and relatively higher rwbp sim- Since we normalized the matrices across rows, which are
ilarities (greater than the rw means of 0.0055) and relatively the source nodes, differences between the bipartite matrix,
lower rwcos (lower than the rw means). The reverse pattern Mbp , and the cosine matrices implied that the degree of the
is seen for two pairs of problems from section 6 that have target node had a greater impact on the variability in the
high cosine similarities. These problems have higher than bipartite matrix. To measure the impact of the edge degree
average rwcos and lower than average rwbp . What differen- on the target nodes, we considered the column sum for those
tiates the two pairs of problems is that the section 5 prob- targets that had 1 edge, those that had 2, etc. up to 20
lems have a relatively large number of words in common: edges. The results are summarized in Figure 4. As can be
14 for the first pair, 12 for the second pair. In both pairs, seen, the column sum varies linearly by the number of target
some of the words have relatively high document frequency. edges in the bipartite matrix, whereas the cosine matrices
As discussed above, these two properties increase the degree do not. We found the cubed root of the column sum in Mbp
centrality of the start node of a step in the rwcos graph, and approaches the distribution of column sums of the cosine
thus lower the probability of hitting each of the start node’s matrices, which is provided in Figure 4.
one-degree neighbors. This effect propagates along the two
steps of the walk. For the rwbp graph, however, as the num-
ber of shared math words for a pair of problems increases,
the number of paths from one to the other also increases,
thus raising the probability of the traversal. This effect also
propagates through a two-step walk. In contrast to the sec-
tion 5 problems, the two section 6 problems have relatively
fewer words in common: 3 for both pairs.

For problem pairs where the cosine similarity is between
0.40 and 0.60, the mean similarity from rwbp is 30% higher
than for rwcos for when the number of math words in com-
mon is 3 (0.0033 vs. 0.0043), 80% higher when the number
of math words in common is 6 (0.0024 versus 0.0045), and Figure 3: Tail distribution of similarity values in Mcos ,
three as high when the number of math words in common Mcosrw , and Mbp . Because 62% of the values in Mcos are 0,
is 9 (0.0023 versus 0.0068). For problems pairs where the the plot shows only non-zero values.

5. CONCLUSION
Table 2: Summary statistics of document frequency (df) of
Visualization of the three similarity matrices shows they re-
math words, number of math words in problems, and degree
veal the same overall patterns, thus each is confirmed by
centrality of rwcos
the others. However, the bipartite walk was the most sen-
sitive to word frequency across exercises, and the number
df math words degree ctr. rwcos
of words in problems. With our goal of automatically dis-
minimum 5.54 ×10−4 1 0 covering knowledge components and identifying their rela-
maximum 0.424 24 1,706 tionships, the random walk that stepped in proportion to
mean 0.183 8.35 418.4 its cosine similarity performed best. It was able to discover
median 6.66 ×10−3 8.00 340 second-degree relationships that seem reasonable as we ex-
std. dev. 0.314 3.64 317.1 plore by eye those matches. Future work will test these re-
knowledge-learning-instruction (KLI) framework:
Toward bridging the science-practice chasm to
enhance robust student learning. Cognitive Science,
36(5):757–798, 2012.
[9] K. Krippendorff. Content analysis: An introduction to
its methodology. Sage Publications, Beverly Hills, CA,
1980.
[10] N. Kushman, Y. Artzi, L. Zettlemoyer, and
R. Barzilay. Learning to automatically solve algebra
word problems. In Proceedings of the 52nd Annual
Meeting of the Association for Computational
Linguistics (Volume 1: Long Papers), pages 271–281,
Baltimore, Maryland, June 2014. Association for
Figure 4: Distribution of column sums by number of edges
Computational Linguistics.
in the target node represented by the column. Error plots
show the mean and standard error for each type. Black line [11] N. Li, W. W. Cohen, and K. R. Koedinger.
is the cubed root of the mean of the column sums of Mbp . Discovering student models with a clustering
algorithm using problem content. In Proceedings of the
6th International Conference on Educational Data
lationships with student performance data. We should find, Mining, 2013.
for example, that if two exercises are conceptually similar, [12] E. Minkov, W. W. Cohen, and A. Y. Ng. Contextual
then student outcomes should also be similar and learning search and name disambiguation in email using
curves should reveal shared knowledge components. In this graphs. In Proceedings of the 29th Annual
respect, such automatically constructed knowledge graphs International ACM SIGIR Conference on Research
can create more refined domain models that intelligent tu- and Development in Information Retrieval, SIGIR ’06,
toring systems and robust assessments can be built upon. pages 27–34, New York, NY, USA, 2006. ACM.
[13] I. Niles and A. Pease. Mapping WordNet to the
6. REFERENCES SUMO ontology. In Proceedings of the IEEE
[1] Y. An, J. Janssen, and E. E. Milios. Characterizing International Knowledge Engineering Conference,
and mining the citation graph of the computer science pages 23–26, 2003.
literature. Knowledge and Information Systems, [14] J. Otterbacher, G. Erkan, and D. R. Radev. Biased
6(6):664–678, 2004. lexrank: Passage retrieval using random walks with
[2] J. Berant, V. Srikumar, P.-C. Chen, question-based priors. Information Processing and
A. Vander Linden, B. Harding, B. Huang, P. Clark, Management, 45(1):42–54, 2009.
and C. D. Manning. Modeling biological processes for [15] L. Page, S. Brin, R. Motwani, and T. Winograd. The
reading comprehension. In Proceedings of the 2014 pagerank citation ranking: Bringing order to the web.
Conference on Empirical Methods in Natural Language Technical Report 1999-66, Stanford InfoLab,
Processing (EMNLP), pages 1499–1510, Doha, Qatar, November 1999. Previous number =
October 2014. Association for Computational SIDL-WP-1999-0120.
Linguistics. [16] T. M. Pilehvar, D. Jurgens, and R. Navigli. Align,
[3] P. Buitelaar, A. Frank, M. Hartung, and S. Racioppa. disambiguate and walk: A unified approach for
Ontology-based information extraction and integration measuring semantic similarity. In Proceedings of the
from heterogeneous data sources, 2008. 51st Annual Meeting of the Association for
[4] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H. Computational Linguistics (Volume 1: Long Papers),
Jr., and T. M. Mitchell. Toward an architecture for pages 1341–1351. Association for Computational
never-ending language learning. In Proceedings of the Linguistics, 2013.
24th Conference on Artificial Intelligence (AAAI), [17] J. Scheffczyk, A. Pease, and M. Ellsworth. Linking
volume 2, pages 1306–1313, 2010. FrameNet to the suggested upper merged ontology. In
[5] G. Erkan and D. R. Radev. Lexrank: Graph-based B. Hennett and C. Fellbaum, editors, Formal Ontology
lexical centrality as salience in text summarization. J. in Information Systems, pages 289–. IOS Press, 2006.
Artif. Int. Res., 22(1):457–479, Dec. 2004. [18] M. J. Seo, H. Hajishirzi, A. Farhadi, and O. Etzioni.
[6] B. Hixon, P. Clark, and H. Hajishirzi. Learning Diagram understanding in geometry questions. In
knowledge graphs for question answering through Proceedings of the 28th AAAI Conference on Artificial
conversational dialog. In Proceedings of the 2013 Intelligence, 2014.
Conference of the North American Chapter of the [19] M. Šimko and M. Bieliková. Automatic concept
Association for Computational Linguistics: Human relationships discovery for an adaptive e-course. In
Language Technologies, Denver, CO, May-June 2015. Proceedings of the Second International Conference on
[7] R. J. L. John, R. J. Passonneau, and T. S. McTavish. Educational Data Mining (EDM), pages 171–179,
Semantic similarity graphs of mathematics word 2009.
problems: Can terminology detection help? In
Proceedings of the Eighth International Conference on
Educational Data Mining, 2015.
[8] K. R. Koedinger, A. T. Corbett, and C. Perfetti. The