=Paper= {{Paper |id=Vol-1446/GEDM2015_proc |storemode=property |title=None |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM2015_proc.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/XueLC15 }} ==None== https://ceur-ws.org/Vol-1446/GEDM2015_proc.pdf
        Graph-based Educational Data Mining (G-EDM 2015)

                                         Collin F. Lynch                        Dr. Tiffany Barnes
                                     Department of Computer                 Department of Computer
                                            Science                                Science
                                      North Carolina State                   North Carolina State
                                           University                             University
                                     Raleigh, North Carolina                Raleigh, North Carolina
                                      cflynch@ncsu.edu                      tmbarnes@ncsu.edu
                                      Dr. Jennifer Albert                       Michael Eagle
                               Department of Computer Science                   Department of Computer
                                North Carolina State University                        Science
                                   Raleigh, North Carolina                       North Carolina State
                                       jlsharp@ncsu.edu                               University
                                                                                Raleigh, North Carolina
                                                                                mjeagle@ncsu.edu

1.    INTRODUCTION                                                            Thus, graphs are simple in concept, general in structure, and
Fundamentally, a graph is a simple concept. At a basic level a             have wide applications for Educational Data Mining (EDM).
graph is a set of relationships {e(n0 ,n2 ),e(n0 ,nj ),...,e(nj−1 ,nj )}   Despite the importance of graphs to data mining and data anal-
between elements. This simple concept, however, has afforded the           ysis there exists no strong community of researchers focused on
development of a complex theory of graphs [1] and rich algorithms          Graph-Based Educational Data Mining. Such a community is
for combinatorics [7] and clustering [4]. This has, in turn, made          important to foster useful interactions, share tools and techniques,
graphs a fundamental part of educational data mining.                      and to explore common problems.

   Many types of data can be naturally represented as graphs such          2.    GEDM 2014
as social network data, user-system interaction logs, argument             This is the second workshop on Graph-Based Educational Data
diagrams, logical proofs, and forum discussions. Such data has             Mining. The first was held in conjunction with EDM 2014 in
grown exponentially in volume as courses have moved online and             London [17]. The focus of that workshop was on seeding an initial
educational technology has been incorporated into the traditional          community of researchers, and on identifying shared problems, and
classroom. Analyzing it can help to answer a range of important            avenues for research. The papers presented covered a range of top-
questions such as:                                                         ics including unique visualizations [13], social capital in educational
                                                                           networks [8], graph mining [19, 11], and tutor construction [9].
• What path(s) do high-performing students take through online
  educational materials?                                                      The group discussion sections at that workshop focused on the
• What social networks can foster or inhibit learning?                     distinct uses of graph data. Some of the work presented focused
• Do users of online learning tools behave as the system designers         on student-produced graphs as solution representations (e.g. [14,
  expect?                                                                  3]) while others focused more on the use of graphs for large-scale
• What diagnostic substructures are commonly found in student-             analysis to support instructors or administrators (e.g. [18, 13]).
  produced diagrams?                                                       These differing uses motivate different analytical techniques and,
• Can we use prior student data to identify students’ solution             as participants noted, change our underlying assumptions about
  plan, if any?                                                            the graph structures in important ways.
• Can we use prior student data to provide meaningful hints in
  complex domains?
• Can we identify students who are particularly helpful based              3.    GEDM 2015
  upon their social interactions?                                          Our goal in this second workshop was to build upon this nascent
                                                                           community structure and to explore the following questions:

                                                                           1. What common goals exist for graph analysis in EDM?
                                                                           2. What shared resources such as tools and repositories are re-
                                                                              quired to support the community?
                                                                           3. How do the structures of the graphs and the analytical methods
                                                                              change with the applications?

                                                                              The papers that we include here fall into four broad categories:
                                                                           interaction, induction, assessment, and MOOCs.
  Work by Poulovassilis et al. [15] and Lynch et al. [12] focuses           Data Mining 2014, co-located with 7th International
on analyzing user-system interactions in state based learning               Conference on Educational Data Mining (EDM
environments. Poulovassilis et al. focuses on the analyses of               2014), London, United Kingdom, July 4-7, 2014., volume
individual users’ solution paths and presents a novel mechanism             1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
to query solution paths and identify general solution strategies.       [4] M. Girvan and M. E. J. Newman. Community
Lynch et al. by contrast, examined user-system interactions from            structure in social and biological networks. Proc. of the
existing model-based tutors to examine the impact of specific               National Academy of Sciences, 99(12):7821–7826, June 2002.
design decisions on student performance.                                [5] J. Guerra. Graph analysis
                                                                            of student model networks. In C. F. Lynch, T. Barnes,
   Price & Barnes [16] and Hicks et al. [6] focus on applying these         J. Albert, and M. Eagle, editors, Proceedings of the Second
same analyses in the open-ended domain of programming. Unlike               International Workshop on Graph-Based Educational Data
more discrete tutoring domains where users enter single equations           Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
or select actions, programming tutors allow users to make drastic       [6] A. Hicks, V. Catete, R. Zhi,
changes to their code on each step. This can pose challenges for            Y. Dong, and T. Barnes. Bots: Selecting next-steps from
data-driven methods as the student states are frequently unique             player traces in a puzzle game. In C. F. Lynch, T. Barnes,
and admit no easy single-step advice. Price and Barnes present a            J. Albert, and M. Eagle, editors, Proceedings of the Second
novel method for addressing the data sparsity problem by focusing           International Workshop on Graph-Based Educational Data
on minimal-distance changes between users [16] while in related             Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
work Hicks et al. focuses on the use of path weighting to select        [7] D. E. Knuth. The
actionable advice in a complex state space [6].                             Art of Computer Programming: Combinatorial Algorithms,
                                                                            Part 1, volume 4A. Addison-Wesley, 1st edition, 2011.
  The goal in much of this work is to identify rules that can           [8] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala.
be used to characterize good and poor interactions or good and              What is the source of social capital? the association
poor graphs. Xue at al. sought address this challenge in part via           between social network position and social presence
the automatic induction of graph rules for student-produced dia-            in communities of inquiry. In S. G. Santos and O. C. Santos,
grams [22]. In their ongoing work they are applying evolutionary            editors, Proceedings of the Workshops held at Educational
computation to the induction of Augmented Graph Grammars,                   Data Mining 2014, co-located with 7th International
a graph-based formalism for rules about graphs.                             Conference on Educational Data Mining (EDM
                                                                            2014), London, United Kingdom, July 4-7, 2014., volume
   The work described by Leo-John et al. [10], Guerra [5] and               1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
Weber & Vas [21], takes a different tack and focuses not on graphs      [9] R. Kumar. Cross-domain performance of automatic tutor
representing solutions or interactions but on relationships. Leo-           modeling algorithms. In S. G. Santos and O. C. Santos,
John et al. present a novel approach for identifying closely-related        editors, Proceedings of the Workshops held at Educational
word problems via semantic networks. This work is designed to               Data Mining 2014, co-located with 7th International
support content developers and educators in examining a set of              Conference on Educational Data Mining (EDM
questions and in giving appropriate assignments. Guerra takes               2014), London, United Kingdom, July 4-7, 2014., volume
a similar approach to the assessment of users’ conceptual changes           1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
when learning programming. He argues that the conceptual
                                                                       [10] R.-J. Leo-John, T. McTavish, and R. Passonneau.
relationship graph affords a better mechanism for automatic as-
                                                                            Semantic graphs for mathematics word problems based
sessment than individual component models. This approach is
                                                                            on mathematics terminology. In C. F. Lynch, T. Barnes,
also taken up by Weber and Vas who present a toolkit for graph-
                                                                            J. Albert, and M. Eagle, editors, Proceedings of the Second
based self-assessment that is designed to bring these conceptual
                                                                            International Workshop on Graph-Based Educational Data
structures under students’ direct control.
                                                                            Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
   And finally, Vigentini & Clayphan [20], and Brown et al. [2]        [11] C. F. Lynch. AGG: augmented graph grammars for complex
focus on the unique problems posed by MOOCs. Vigentini and                  heterogeneous data. In S. G. Santos and O. C. Santos,
Clayphan present work on the use of graph-based metrics to                  editors, Proceedings of the Workshops held at Educational
assess students’ on-line behaviors. Brown et al., by contrast, focus        Data Mining 2014, co-located with 7th International
not on local behaviors but on social networks with the goal of              Conference on Educational Data Mining (EDM
identifying stable sub-communities of users and of assessing the            2014), London, United Kingdom, July 4-7, 2014., volume
impact of social relationships on users’ class performance.                 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
                                                                       [12] C. F. Lynch, T. W. Price,
                                                                            M. Chi, and T. Barnes. Using the hint factory to analyze
4.    REFERENCES                                                            model-based tutoring systems. In C. F. Lynch, T. Barnes,
 [1] B. Bollobás.
                                                                            J. Albert, and M. Eagle, editors, Proceedings of the Second
     Modern Graph Theory. Springer Science+Business
                                                                            International Workshop on Graph-Based Educational Data
     Media Inc. New York, New York, U.S.A., 1998.
                                                                            Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
 [2] R. Brown, C. F. Lynch, Y. Wang,
                                                                       [13] T. McTavish. Facilitating graph interpretation via interactive
     M. Eagle, J. Albert, T. Barnes, R. Baker, Y. Bergner,
                                                                            hierarchical edges. In S. G. Santos and O. C. Santos,
     and D. McNamara. Communities of performance
                                                                            editors, Proceedings of the Workshops held at Educational
     & communities of preference. In C. F. Lynch, T. Barnes,
                                                                            Data Mining 2014, co-located with 7th International
     J. Albert, and M. Eagle, editors, Proceedings of the Second
                                                                            Conference on Educational Data Mining (EDM
     International Workshop on Graph-Based Educational Data
                                                                            2014), London, United Kingdom, July 4-7, 2014., volume
     Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
                                                                            1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
 [3] R. Dekel and K. Gal. On-line plan recognition in exploratory
                                                                       [14] B. Mostafavi and T. Barnes. Evaluation of logic proof problem
     learning environments. In S. G. Santos and O. C. Santos,
                                                                            difficulty through student performance data. In S. G. Santos
     editors, Proceedings of the Workshops held at Educational
     and O. C. Santos, editors, Proceedings of the Workshops             Data Mining 2014, co-located with 7th International
     held at Educational Data Mining 2014, co-located with 7th           Conference on Educational Data Mining (EDM
     International Conference on Educational Data Mining (EDM            2014), London, United Kingdom, July 4-7, 2014., volume
     2014), London, United Kingdom, July 4-7, 2014., volume              1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
     1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.          [19] K. Vaculı́k, L. Nezvalová, and L. Popelı́nsky. Graph mining
[15] A. Poulovassilis, S. G. Santos, and M. Mavrikis. Graph-based        and outlier detection meet logic proof tutoring. In S. G. Santos
     modelling of students’ interaction data from exploratory            and O. C. Santos, editors, Proceedings of the Workshops
     learning environments. In C. F. Lynch, T. Barnes,                   held at Educational Data Mining 2014, co-located with 7th
     J. Albert, and M. Eagle, editors, Proceedings of the Second         International Conference on Educational Data Mining (EDM
     International Workshop on Graph-Based Educational Data              2014), London, United Kingdom, July 4-7, 2014., volume
     Mining (GEDM 2015). CEUR-WS, June 2015. (in press).                 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.
[16] T. Price and T. Barnes. An                                     [20] L. Vigentini and A. Clayphan. Exploring the function
     exploration of data-driven hint generation in an open-ended         of discussion forums in moocs: comparing data mining
     programming problem. In C. F. Lynch, T. Barnes,                     and graph-based approaches. In C. F. Lynch, T. Barnes,
     J. Albert, and M. Eagle, editors, Proceedings of the Second         J. Albert, and M. Eagle, editors, Proceedings of the Second
     International Workshop on Graph-Based Educational Data              International Workshop on Graph-Based Educational Data
     Mining (GEDM 2015). CEUR-WS, June 2015. (in press).                 Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
[17] S. G. Santos and O. C. Santos,                                 [21] C. Weber and R. Vas. Studio: Ontology-based
     editors. Proceedings of the Workshops held at Educational           educational self-assessment. In C. F. Lynch, T. Barnes,
     Data Mining 2014, co-located with 7th International                 J. Albert, and M. Eagle, editors, Proceedings of the Second
     Conference on Educational Data Mining (EDM                          International Workshop on Graph-Based Educational Data
     2014), London, United Kingdom, July 4-7, 2014, volume               Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
     1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014.          [22] L. Xue, C. F. Lynch, and M. Chi. Graph grammar induction
[18] V. Sheshadri, C. Lynch, and T. Barnes.                              by genetic programming. In C. F. Lynch, T. Barnes,
     Invis: An EDM tool for graphical rendering and analysis of          J. Albert, and M. Eagle, editors, Proceedings of the Second
     student interaction data. In S. G. Santos and O. C. Santos,         International Workshop on Graph-Based Educational Data
     editors, Proceedings of the Workshops held at Educational           Mining (GEDM 2015). CEUR-WS, June 2015. (in press).
     Graph Grammar Induction via Evolutionary Computation

                   Linting Xue                        Collin F.Lynch                         Min Chi
              Electrical Engineering                Computer Science                  Computer Science
                   Department                           Department                        Department
              North Carolina State                 North Carolina State              North Carolina State
                    University                           University                        University
             Raleigh, North Carolina,             Raleigh, North Carolina,          Raleigh, North Carolina,
                      U.S.A.                               U.S.A.                            U.S.A.
               lxue3@ncsu.edu                      cflynch@ncsu.edu                    mchi@ncsu.edu

ABSTRACT                                                        In this paper we describe our ongoing work on the automatic
Augmented Graph Grammars provide a robust formalism             induction of Augmented Graph Grammars via Evolutionary
for representing and evaluating graph structures. With the      Computation (EC). Our long-term goal in this work is to
advent of robust graph libraries such as AGG, it has be-        develop automated techniques that can extract empirically-
come possible to use graph grammars to analyze realistic        valid graph rules which can, in turn, be used to classify
data. Prior studies have shown that graph rules can be used     student-produced argument diagrams and to provide the ba-
to evaluate student work and to identify empirically-valid      sis for automated student guidance and evaluation. This will
substructures using hand-authored rules. In this paper we       build upon our prior on the evaluation of a-priori rules for
describe proposed work on the automatic induction of graph      student arguments. We will begin with background material
grammars for student data using evolutionary computation        on Augmented Graph Grammars and discuss prior work on
via the pyEC system.                                            grammar induction. We will then present an overview of our
                                                                planned work.
Keywords
Augmented Graph Grammars, Graph Data, Evolutionary              2.   AUGMENTED GRAPH GRAMMARS
Computation                                                          & ARGUMENT DIAGRAMS
                                                                Classical graph grammars are designed to deal with fixed
1.    INTRODUCTION                                              graphs that are composed from a finite set of static node and
Graph Grammars are logical rule representations for graph       arc types. Augmented graph grammars are an extension of
structures. They can be designed to encode classes of suit-     simple graph grammars that allow for complex node and arc
able graphs to recognize complex sub-features. They were        types, optional substructures, and complex rule expressions
introduced by Rosenfeld and Pfaltz in 1969 as “Context-free     [12]. Rather than using a fixed alphabet of graph compo-
web grammars” [1]. Since then Graph grammars have been          nents they are defined by a complex ontology that allows for
applied to a wide range of areas, including pattern recogni-    subsidiary types such as textual fields, access functions, and
tion [2, 3, 4]; visual programming languages [5]; biological    directional information. They can also be used to evaluate
development [6]; classification of chemical compounds [7, 8];   negated elements as well as quantified expressions. As such
and social network analysis [9, 10, 11]. Simple graph gram-     they are better suited to rich graph data such as user-system
mars are, like string grammars, composed of a set of pro-       interaction logs and student-produced argument diagrams.
duction rules that map from one structure to another. In
this case the rules map from a simple subgraph, typically a     Augmented Graph Grammars have previously been used for
single node or arc, to a more complex structure. As with        the detection of empirically-valid substructures in student-
string grammars the node and arc types are drawn from           produced argument diagrams [13, 14]. In that work a-priori
finite alphabets. Despite their utility, however, graph gram-   rules were used to represent key discussion features and ar-
mars are very difficult to construct. The data structures       gumentative flaws. Argument diagrams are graphical ar-
required are complex [5]. Moreover, development of suitable     gument representations that reify key features of arguments
graph grammars generally requires considerable domain ex-       such as hypothesis statements, claims, and citations as nodes
pertise. Most existing uses of graph grammars have relied       and the supporting, opposing, and informational relation-
on hand-authored rules.                                         ships as arcs between them.

                                                                A sample diagram collected in that work is shown in Figure
                                                                1 The diagram includes a central research claim node, which
                                                                has a single text field indicating the content of the research
                                                                claim. A set of citation nodes are connected to the claim
                                                                node via a set of supporting, opposing and undefined arcs
                                                                colored with green, red and blue respectively. Each citation
                                                                node contains two fields: one for the citation information,
                                                                and the other for a summary of the cited work; each arc has
                                                                a single text field explaining why the relationship holds. At
      Figure 1: A student-produced LASAD argument diagram representing an introductory argument.


the bottom of the diagram, there is a single isolated hypoth-                                            t
esis node that contains two text fields, one for a conditional
or IF field, and the other for conditional or THEN field. We
expect the induced graph grammars from a set of argument           (P aredW comp)                    O        S
diagrams can be used to evaluate the student thesis work.
                                                                                                 a       ¬c       b
Figure 2 shows an a-priori rule that was defined as part
of that work. This rule is designed to identify a subgraph                            
                                                                                        t.T pye = “claim00 or“hypothesis00 
                                                                                                                           
where a single target node t is connected to two separate                             
                                                                                               a.T ype = “Citation00
                                                                                                                          
citation nodes a and b such that: a is connected to t via an
                                                                                                                          
                                                                                                                   00
opposing path; b is connected via a supporting path; and                                      b.T ype = “Citation         
                                                                                             c.T ype = “Comparison00
                                                                                                                          
there exists no comparison arc between a and b. The rules
                                                                                                                          
in that study were implemented using AGG an augmented
graph grammar library built in Python [12]. AGG matches
                                                                  Figure 2: A simple augmented graph grammar rule
the graphs using recursive stack-based algorithm. The code
                                                                  that detects compared counterarguments. The rule
first matches the ground nodes at the top-level of the class
                                                                  shows a two citation nodes (a, & b) that have oppos-
(t, a, & b). It then tests for the recursive productions O,
                                                                  ing relationships with a shared claim node (t) and do
and S, before finally testing for the negated comparison arc
                                                                  not have a comparison arc (c) drawn between them.
c. This rule does not make use of the full range of potential
                                                                  The arcs S and O represent recursive supporting
capacity for Augmented Graph Grammars. However it is
                                                                  and opposing path.
illustrative of the type of rules we plan to induce here, rules
that generalize beyond basic types and draw on existing pro-
duction classes but not, at least in the immediate term, use
complex textual elements or functional features.
                                                                  algorithm [19]. They are based upon controlled graph walks
                                                                  coupled with indexing. While these algorithms are effective,
3.   GRAMMAR INDUCTION                                            particularly on smaller graphs, with low vertex degree they
Graph and relational data has grown increasingly prevalent        can also overfit simpler graph structures and they do not
and graph analysis algorithms have been applied in a wide         scale well to larger, denser graph data [20].
range of domains from social network analysis [15] to bioin-
formatics [16]. Most of this work falls into one of two cate-     The SUBDUE system takes a greedy-compression approach
gories of algorithms: frequent subgraph matching, and graph       to graph mining. SUBDUE searches for candidate sub-
compression.                                                      graphs that can best compress the input graphs by replacing
                                                                  a candidate subgraph with a single vertex. Then nodes and
A number of algorithms have been developed to discover fre-       arcs are added to the vertices to form new candidate sub-
quent subgraphs. These include the gSpan algorithm devel-         graphs. The process is recursive and relies on the Minimum-
oped by Yan and Han [17]; the AGM algorithm developed by          Description-Length (MDL) principle to evaluate the candi-
Inokuchi et al [18]; and the FSG algorithm developed by Ku-       dates. SUBDUE has been applied successfully to extract
ramochi and Karypis which is based on the previous Apriori        structure from visual programming [5], web search [21], and
analyzing user behaviors in games [22].                          lution space. Therefore it is well suited to our present needs.
                                                                 This flexibility is costly, however, as EC is far more compu-
While these methods are successful they have practical and       tationally expensive than more specialized algorithms, and
theoretical limitations. Both classes of approaches are lim-     applications of EC can require a great deal of tuning for
ited to static graphs composed from a finite alphabet of node    each use. In the subsections below we will describe the fit-
and arc types. The frequent subgraph approaches are based        ness function and the operators that we will use in this work.
upon iterative graph walks and can be computationally ex-        For this work we will rely on pyEC a general purpose evo-
pensive and are limited to finding exact matches. They do        lutionary computation engine that we have developed [26].
not generalize beyond the exact graphs matched nor do they
allow for recursive typing. SUBDUE, by contrast is a greedy      4.2    Dataset
algorithm that finds the single most descriptive grammar         Our initial analysis will be based upon a corpus of expert
and does not allow for weighted matches.                         graded student produced argument diagrams and essays pre-
                                                                 viously described in [13, 14]. That dataset was collected as
For our present purposes, however, our goal is to identify       part of a study on students’ use of argument diagrams for
multiple heirarchical classes of the type shown in Figure 2      writing that was conducted at the University of Pittsburgh
that can: generalize beyond exact node and arc types; can        in 2011. For that study we selected a set of students in an
draw on recursive rule productions; and can be weighted          undergraduate-level course on Psychological Research Meth-
based upon the graph quality. Moreover our long-term goal        ods. As part of the course the students were tasked with
with this work is to explore graph rule induction mechanisms     planning and executing an empirical research study and then
that can be expanded to include textual rules and complex        drafting a written report. The students were permitted to
constraints. For that reason we have elected to apply evo-       work individually or in teams. This report was structured as
lutionary computation. This is a general-purpose machine         a standard empirical workshop paper. Prior to drafting the
learning mechanism that can be tunes to explore a range of       report the students were tasked with diagramming the ar-
possible induction mechanisms.                                   gument that they planned to make using LASAD an online
                                                                 tool for argument diagramming and annotation.
4.    METHODS
                                                                 Subsequent to this data collection process the diagrams and
4.1   Evolutionary Computation                                   essays were graded by an experienced TA using a set of par-
Evolutionary Computation (EC) is a general class of ma-          allel grading rubrics. These rubrics focused on the quality of
chine learning and optimization methods that are inspired        the arguments in the diagrams and essays and were used to
by the process of Darwinian evolution through natural se-        demonstrate that the structure and quality of the diagrams
lection [23] such as Genetic Algorithms [24] or Genetic Pro-     can be used to predict the students’ subsequent essay per-
gramming [25]. EC algorithms begin with a population of          formance. These grades will be used as the weighting metric
randomly generated candidate solutions such as snippets of       for the diagrams and will be correlated with performance as
random code, strings representing a target function, or for-     part of the fitness function we describe below. After comple-
mal rules. Each of these solutions is ranked by a fitness        tion of the data collection, grading, and testing phases and
function that is used to evaluate the quality of the individu-   accounting for student dropout and incomplete assignments
als. These functions can be defined by absolute measures of      we collected 105 graded diagram-essay pairs 74 of which were
success such as a suite of test cases, or by relative measures   authored by teams.
such as a competition between chess-playing systems.
                                                                 4.3    Solution Representation
Once the individuals have been ranked a new generation of        For the purposes of our present experiments we will use a
individuals is produced through a combination of crossover       restricted solution representation that relies on a subset of
and mutation operations. Crossover operations combine two        the augmented graph grammar formalism exemplified by the
or more parents to produce one or more candidate children.       rule shown in Figure 2. This will include only element types
In Genetic Algorithms where the candidate solutions are rep-     and recursive productions. In future work we plan to sup-
resented as strings this can be accomplished by splitting two    port the induction of more complex rules defined by multiple
parents at a given index and then exchanging the substrings      graph classes, novel productions, and expressions. However
to produce novel children. In Genetic Programming the par-       for the present study we will focus on the simple case of
ents exchange blocks of code, functions, or subtrees. Muta-      individual classes coupled with predefined productions.
tion operations alter randomly-selected parts of a candidate
solution by swapping out one symbol or instruction for an-       4.4    Fitness Function
other, adding new sub-solutions, or deleting components.         We plan to use the frequency correlation metric previously
This process of ranking and regeneration will iterate until      employed in [13, 14]. In that study the authors assessed the
a target performance threshold is reached or a maximum           empirical validity of a set of a-priori diagram rules. The
number of generations has passed.                                validity of each individual rule was assessed by testing the
                                                                 correlation between the frequency of the class in the existing
EC methods are highly general algorithms that can be read-       graph and the graph grade. The strength of that correlation
ily adapted to novel domains by selecting an appropriate         was estimated using Spearman’s ρ a non-parametic measure
solution representation and modification operations. Thus,       of correlation [27]. In that work the authors demonstrated
in contrast to more specific methods such as SUBDUE, the         that the a-priori rule frequency was correlated with stu-
EC algorithm allows us to tune the inductive bias of our         dents’ subsequent essay grades and showed that the frequen-
search and to explore alternative ways of traversing the so-     cies could be used to predict students’ future performance.
4.5   Mutation
Our mutation operator will draw on the predefined graph
ontology to make atomic changes to an existing graph class.              B      C      D                  F       G
The change will be one of the following operations:
                                                                         1      ∅      3      A            7      ∅      E
Change Node change an existing node’s type.

Change Arc Change an existing arc’s type or orientation.                        4      5      B                   ∅      F

Delete Node Delete a node and its associated arcs.
                                                                                       ∅      C
Delete Arc Delete an existing arc.

Add Node Add a novel node with a specified type.
                                                                    Figure 3: Canonical matricies for crossover parents.
Add Arc Add an arc between existing nodes or add with
    new nodes.

4.6   Crossover                                                          B      G      D                   F      C
By design the crossover operation should, like genetic crossover,
be conservative. Two very similar parents should produce                 1      ∅      3      E            7      ∅      A
similar offspring. Crossover operations should therefore pre-
serve good building blocks and sub-solutions or introns through
random behavior [25]. Arbitrary graph alignment and crossover                   ∅      5      B                   4      F
is a challenging problem that risks causing unsustainable
changes on each iteration. We therefore treat graph crossover
as a matrix problem.                                                                   ∅      G

For each pair of parent classes we will define a pair of di-
agonal matricies of the type illustrated in Figure 3. The           Figure 4: Canonical matricies for crossover children.
letter indicies on the top and right indicate nodes while the
numerical indicies internally indicate arcs, and the ∅ symbol
indicates that no arc is present. The matricies are gener-          grading and feedback. This work represents an improve-
ated in a canonical order based upon the order in which the         ment over prior graph grammar induction algorithms which
nodes were added to the class. Thus on each iteration of the        are limited to classical graph grammars and greedy extrac-
crossover process the corresponding elements will obtain the        tion. This work also represents an extension for evolution-
same index. As a consequence good subsolutions will obtain          ary computation by shifting it into a new domain. As part
the same location and will tend to be preserved over time.          of this work we also plan to explore additional extensions to
                                                                    the standard evolutionary computation algorithm to address
Once a set of parent matricies has been generated we then           problems of over-fitting such as χ2 reduction.
generate two child matricies of the same size as the parents
and then randomly select the node and arc members. In the
example shown in figures 3 and 4 the parents have nodes             6.   REFERENCES
{A,B,C,D} and {E,F,G} while the children have {E,B,G,D}              [1] John L Pfaltz and Azriel Rosenfeld. Web grammars.
and {A,F,C}. Thus we align the nodes in canonical order                  In Proceedings of the 1st international joint conference
and, for each node pair, we flip a coin to decide where they             on Artificial intelligence, pages 609–619. Morgan
are copied. If one parent is larger than the other than any              Kaufmann Publishers Inc., 1969.
additional nodes, in this case D, will be copied to the larger       [2] John L Pfaltz. Web grammars and picture description.
child. We then perform a comparable exchange process for                 Computer Graphics and Image Processing,
the arcs. Each arc or potential arc is defined uniquely in the           1(2):193–220, 1972.
matricies by its endpoints. We thus align the lists of arcs          [3] Horst Bunke. Attributed programmed graph
in a comparable manner and then decide randomly which                    grammars and their application to schematic diagram
arc, or empty arc, to copy. As with the nodes, extra arcs                interpretation. IEEE Transactions on Pattern
from the larger parent, in this case 3,5, and one ∅ are copied           Analysis and Machine Intelligence, 4(6):574–582, 1982.
directly into the larger of the two children.                        [4] Michihiro Kuramochi and George Karypis. Finding
                                                                         frequent patterns in a large sparse graph*. Data
5.    FUTURE WORK                                                        mining and knowledge discovery, 11(3):243–271, 2005.
In this paper we presented a method for the induction of             [5] Keven Ates, Jacek Kukluk, Lawrence Holder, Diane
augmented graph grammars through evolutionary computa-                   Cook, and Kang Zhang. Graph grammar induction on
tion. We are presently applying this work to the automatic               structural data for visual programming. In Tools with
induction of empirically-valid rules for student-produced ar-            Artificial Intelligence, 2006. ICTAI’06. 18th IEEE
gument diagrams. This work will serve to extend our prior                International Conference on, pages 232–242. IEEE,
efforts on the use of augmented graph grammars for student               2006.
 [6] Francesc Rosselló and Gabriel Valiente. Graph                   CSREA Press, 2008.
     transformation in molecular biology. In Formal              [17] Xifeng Yan and Jiawei Han. gspan: Graph-based
     Methods in Software and Systems Modeling, pages                  substructure pattern mining. In Proceedings of the
     116–133. Springer, 2005.                                         IEEE International Conference on Data Mining
 [7] Luc Dehaspe, Hannu Toivonen, and Ross D King.                    (ICDM 2002), pages 721–724. IEEE, 2002.
     Finding frequent substructures in chemical                  [18] Akihiro Inokuchi, Takashi Washio, and Hiroshi
     compounds. In KDD, volume 98, page 1998, 1998.                   Motoda. An apriori-based algorithm for mining
 [8] Stefan Kramer, Luc De Raedt, and Christoph Helma.                frequent substructures from graph data. In Principles
     Molecular feature mining in hiv data. In Proceedings             of Data Mining and Knowledge Discovery, pages
     of the seventh ACM SIGKDD international conference               13–23. Springer, 2000.
     on Knowledge discovery and data mining, pages               [19] Michihiro Kuramochi and George Karypis. Frequent
     136–143. ACM, 2001.                                              subgraph discovery. In Proceedings IEEE International
 [9] Wenke Lee and Salvatore J Stolfo. A framework for                Conference on Data Mining. (ICDM 2001), pages
     constructing features and models for intrusion                   313–320. IEEE, 2001.
     detection systems. ACM transactions on Information          [20] MICHIHIRO Kuramochi and George Karypis. Finding
     and system security (TiSSEC), 3(4):227–261, 2000.                topological frequent patterns from graph datasets.
[10] Calvin Ko. Logic induction of valid behavior                     Mining Graph Data, pages 117–158, 2006.
     specifications for intrusion detection. In Proceedings of   [21] Nitish Manocha, Diane J Cook, and Lawrence B
     the IEEE Symposium on Security and Privacy. (S&P                 Holder. Cover story: structural web search using a
     2000), pages 142–153. IEEE, 2000.                                graph-based discovery system. Intelligence,
[11] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast                12(1):20–29, 2001.
     algorithms for mining association rules. In Proceedings     [22] Diane J. Cook, Lawrence B. Holder, and G. Michael
     of the 20th International Conference on very large               Youngblood. Graph-based analysis of human transfer
     data bases, VLDB, volume 1215, pages 487–499, 1994.              learning using a game testbed. IEEE Trans. on
[12] Collin F Lynch. Agg: Augmented graph grammars for                Knowl. and Data Eng., 19:1465–1478, November 2007.
     complex heterogeneous data. In Proceedings of the           [23] Charles Darwin. On the Origin of Species by Means of
     first international workshop on Graph-Based                      Natural Selection, or the Preservation of Favoured
     Educational Data Mining (GEDM 2014).                             Races in the Struggle for Life. John Murray:
[13] Collin F. Lynch and Kevin D. Ashley. Empirically                 Albermarle Street: London, United Kingdom, 6
     valid rules for ill-defined domains. In John Stamper             edition, 1872.
     and Zachary Pardos, editors, Proceedings of The 7th         [24] Melanie Mitchell. An Introduction to Genetic
     International Conference on Educational Data Mining              Algorithms. MIT Press: Cambridge, Massachusetts,
     (EDM 2014). International Educational Datamining                 1999.
     Society IEDMS, 2014.                                        [25] Wolfgang Banzhaf. Genetic programming: an
[14] Collin F. Lynch, Kevin D. Ashley, and Min Chi. Can               introduction on the automatic evolution of computer
     diagrams predict essay grades? In Stefan                         programs and its applications. Morgan Kaufmann
     Trausan-Matu, Kristy Elizabeth Boyer, Martha E.                  Publishers ; Heidelburg : Dpunkt-verlag; San
     Crosby, and Kitty Panourgia, editors, Intelligent                Francisco, California, 1998.
     Tutoring Systems, Lecture Notes in Computer Science,        [26] Collin F. Lynch, Kevin D. Ashley, Niels Pinkwart, and
     pages 260–265. Springer, 2014.                                   Vincent Aleven. Argument graph classification with
[15] Sherry E. Marcus, Melanie Moy, and Thayne Coffman.               genetic programming and c4.5. In Ryan
     Social network analysis. In Diane J. Cook and                    Shaun Joazeiro de Baker, Tiffany Barnes, and
     Lawrence B. Holder, editors, Mining Graph Data,                  Joseph E. Beck, editors, The 1st International
     chapter 17, pages 443–468. John Wiley & Sons, 2006.              Conference on Educational Data Mining, Montreal,
[16] Chang Hun You, Lawrence B. Holder, and Diane J.                  Québec, Canada, June 20-21, 2008. Proceedings, pages
     Cook. Dynamic graph-based relational learning of                 137–146, 2008.
     temporal patterns in biological networks changing over      [27] Wikipedia. Spearman’s rank correlation coefficient —
     time. In Hamid R. Arabnia, Mary Qu Yang, and                     wikipedia, the free encyclopedia, 2013. [Online;
     Jack Y. Yang, editors, BIOCOMP, pages 984–990.                   accessed 27-February-2013].
                                 Communities of Performance
                                 & Communities of Preference

                Rebecca Brown                             Collin Lynch                          Yuan Wang
               North Carolina State                   North Carolina State             Teachers College, Columbia
                    University                             University                          University
                   Raleigh, NC                            Raleigh, NC                        New York, NY
             rabrown7@ncsu.edu                       cflynch@ncsu.edu                 elle.wang@columbia.edu
                Michael Eagle                           Jennifer Albert                       Tiffany Barnes
         North Carolina State University              North Carolina State                  North Carolina State
                 Raleigh, NC                               University                            University
              mjeagle@ncsu.edu                            Raleigh, NC                           Raleigh, NC
                                                jennifer_albert@ncsu.edu                 tmbarnes@ncsu.edu
                 Ryan Baker                             Yoav Bergner                       Danielle McNamara
         Teachers College, Columbia                 Educational Testing Service             Arizona State University
                 University                               Princeton, NJ                           Phoenix, AZ
               New York, NY                          ybergner@gmail.com                 dsmcnamara1@gmail.com
       ryanshaunbaker@gmail.com

ABSTRACT                                                            the performance of weaker ones. It has not yet been shown,
The current generation of Massive Open Online Courses (MOOCs)       however, that this type of support occurs in practice.
operate under the assumption that good students will help poor
students, thus alleviating the burden on instructors and Teaching   Prior research on social networks has shown that social groups,
Assistants (TAs) of having thousands of students to teach. In       even those that gather face-to-face, can fragment into disjoint
practice, this may not be the case. In this paper, we examine so-   sub-communities [37]. This small-group separation, if it takes
cial network graphs drawn from forum interactions in a MOOC         place in an online course, can be considered negative or positive,
to identify natural student communities and characterize them       depending on one’s perspective. If poor students communi-
based on student performance and stated preferences. We exam-       cate only with similarly-floundering peers, then they run the
ine the community structure of the entire course, students only,    risk of perpetuating misunderstandings and of missing insights
and students minus low performers and hubs. The presence of         discussed by better-performing peers and teaching staff. An
these communities and the fact that they are homogeneous with       instructor may wish to avoid this fragmentation to encourage
respect to grade but not motivations has important implications     poor students to connect with better ones.
for planning in MOOCs.
                                                                    These enduring subgroups may be beneficial, however, by help-
Keywords                                                            ing students to form enduring supportive relationships. Research
MOOC, social network, online forum, community detection             by Li et al. has shown that such enduring relationships can
                                                                    enhance students’ social commitment to a course [18]. We be-
                                                                    lieve that this social commitment will in turn help to reduce
1.   INTRODUCTION                                                   feelings of isolation and alienation among students in a course.
The current generation of Massive Open Online Courses (MOOCs)       Eckles and Stradley [9] have shown that such isolation is a key
is designed to leverage student interactions to augment instruc-    predictor of student dropout.
tor guidance. The activity in courses on sites such as Coursera
and edX is centered around user forums that, while curated          We have previously shown that students can form stable com-
and updated by instructors and TAs, are primarily constructed       munities and that those communities are homogeneous with
by students. When planning and building these courses, it is        respect to performance [3]. However that work did not: show
hoped that students will help one another through the course        whether these results are consistent with prior work on imme-
and that interacting with stronger students will help to improve    diate peer relationships; address the impact of hub students on
                                                                    these results; or discuss whether students’ varying goals and
                                                                    preferences motivate the community structure. Our goal in this
                                                                    paper is to build upon our prior work by addressing these issues.
                                                                    In the remainder of this paper we will survey prior educational
                                                                    literature on community formation in traditional and online
                                                                    classrooms. We will then build upon our prior work by exam-
                                                                    ining the impact of hub users. And we will look at the impact
                                                                    of user motivations on community formation.
2.    RELATED WORK                                                     2.2    Communities, Hubs, & Peers
                                                                       Kovanovic et al. [15] examined the relationship between social
2.1    MOOCs, Forums, & Student Performance                            network position or centrality, and social capital formation in
A survey of the literature on MOOCs shows the beginnings of a          courses. Their work is specifically informed by the Community
research base generating an abundance of data that has not yet         of Inquiry (COI) framework. the COI framework is focused on
been completely analyzed [19]. According to Seaton et al. [29],        distance education and is particularly suited to online courses of
most of the time students spend on a MOOC is spent in dis-             the type that we study here. The model views course behavior
cussion forums, making them a rich and important data source.          through three presences which mediate performance: cognitive,
Stahl et al. [30] illustrates how through this online interaction      teaching, and social.
students collaborate to create knowledge. Thus students’ forum
activity is good not only for the individual student posting con-      This social presence considers the nature and persistence of
tent or receiving answers, but for the class as a whole. Huang et      student interactions and the extent to which they reinforce stu-
al. [14] investigated the behavior of the highest-volume posters       dents’ behaviors. In their analysis, the authors sought to test
in 44 MOOC-related forums. These “superposters” tended to              whether network relationships, specifically students’ centrality
enroll in more courses and do better in those courses than the         in their social graph, is related to their social performance as
average. Their activity also added to the overall volume of forum      measured by the nature and type of their interactions. To that
content and they left fewer questions unanswered in the forums.        end, they examined a set of course logs taken from a series of
Huang et al. also found that these superposters did not suppress       online courses offered within a public university. They found
the activity of less-active users. Rienties et al. [25] examined the   that students’ position within their social graph was positively
way in which user interaction in MOOCs is structured. They             correlated with the nature and type of their interactions, thus
found that allowing students to self-select collaborators is more      indicating that central players also engaged in more useful social
conducive to learning than randomly assigning partners. Further,       interactions. They did not extend this work to groups, however,
Van Dijk et al. [31] found that simple peer instruction is signif-     focusing solely on individual hub students.
icantly less effective in the absence of a group discussion step,
pointing again to the importance of a class discussion forum.          Other authors have also examined the relationship between
                                                                       network centrality, neighbor relationships, network density, and
More recently Rosé et al. [27] examined students’ evolving inter-     student performance factors. Eckles and Stradley [9] applied
actions in MOOCs using a Mixed-Membership Stochastic Block             network analysis to student attrition, finding that students with
model which seeks to detect partially overlapping communities.         strong social relationships with other students who drop out
They found that the likelihood that students would drop out            are significantly more likely to drop out themselves. Rizzuto
of the course is strongly correlated with their community mem-         et al. [26] studied the impact of social network density on stu-
bership. Students who actively participated in forums early in         dent performance. Network density is defined as the fraction
the course were less likely to drop out later. Furthermore, they       of possible edges that are present in a given graph. Thus it
found one forum sub-community that was much more prone                 is a measure of how “clique-like” the graph is. The authors
to dropout than the rest of the class, suggesting that MOOC            examined self-reported social networks for students in a large
communities are made up of students who behave in similar              traditional undergraduate psychology course. They found that
ways. This community can in turn reflect or impact a student’s         denser social networks were significantly correlated with per-
level of motivation and their overall experience in a course much      formance. However, a dominance analysis [1] showed that this
like the “emotional contagion” model used in the Facebook mood         factor was less predictive than pure academic ability. These re-
manipulation study by Kramer, Guillroy, and Hancock [16].              sults serve to motivate a focus on the role of social relationships
                                                                       in student behavior. Their analysis is complicated, however, by
Yang et al. [36] also notes that unlike traditional courses stu-       their reliance on self-report data which will skew the strength
dents can join MOOCs at different times and observed that              and recency of the reported relationships.
students who join a course early are more likely to be active
and connected in the forums, and less likely to drop out, than         Fire et al. [11] studied student interaction in traditional class-
those who join later. MOOCs also attract users with a range of         rooms, constructing a social network based on cooperation on
individual motivations. In a standard classroom setting students       class assignments. Students were linked based on partnership on
are constrained by availability, convention, and goals. Few stu-       group work as well as inferred cooperation based on assignment
dents enroll in a traditional course without seeking to complete       submission times and IP addresses. The authors found that a
it and to get formal credit for doing so. MOOCs by virtue of           student’s grade was significantly correlated with the grade of
their openness and flexibility attract a wide range of students        the student with the strongest links to that student in the social
with unique personal motivations [10]. Some join the course            network. We perform similar analysis in this paper to examine
with the intent of completing it. Others may seek only to brush        whether the same correlation exists in MOOCs.
up on existing knowledge, obtain specific skills, or just watch
the videos. These distinct motivations in turn lend themselves         Online student interaction in blended courses has also been
to different in-class behaviors including assignment viewing and       linked to course performance. Dawson [8] extracted student
forum access. The impact of user motivations in online courses         and instructor social networks from a blended course’s online
has been previously discussed by Wang et al. [32, 33]; we will         discussion forums and found that students in the 90th grade
build upon that work here. Thus it is an open question whether         percentile had larger social networks than those in the 10th
these motivations affect students’ community behaviors or not.         percentile. The study also found that high-performing students
                                                                       primarily associated with other high-performing students and
                                                                       were more likely to be connected to the course instructor, while
                                                                       low-performing students tended to associate with other low-
performers. In a blended course, this effect may be offset by          the same material as a graduate-level course, Core Methods
face-to-face interaction not captured in the online social network,    in Educational Data Mining, at Teachers College Columbia
but if the same separation happens in MOOC communities, low-           University. The MOOC spanned from October 24, 2013 to
performing students are less likely to have other chances to learn     December 26, 2013. The weekly course was composed of lecture
from high-performing ones.                                             videos and 8 weekly assignments. Most of the videos contained
                                                                       in-video quizzes (that did not count toward the final grade).
2.3    Community Detection
One of the primary activities students engage in on forums             All of the weekly assignments were structured as numeric input
is question answering. Zhang et al. [38] conducted a social            or multiple-choice questions. The assignments were graded au-
network analysis on an online question-and-answer forum about          tomatically. In each assignment, students were asked to conduct
Java programming. Using vertex in-degree and out-degree, they          analyses on a data set provided to them and answer questions
were able to identify a relatively small number of active users        about it. In order to receive a grade, students had to com-
who answered many questions. This allowed the researchers to           plete this assignment within two weeks of its release with up
develop various algorithms for calculating a user’s Java expertise.    to three attempts for each assignment, and the best score out
Dedicated question-and-answer forums are more structured than          of the three attempts was counted. The course had a total
MOOC forums, with question and answer posts identified, but a          enrollment of over 48,000, but a much smaller number actively
similar approach might help identify which students in a MOOC          participated. 13,314 students watched at least one video, 1,242
ask or answer the most questions.                                      students watched all the videos, 1,380 students completed at
                                                                       least one assignment,and 778 made a post or comment in the
Choo et al. [5] studied community detection in Amazon product-         weekly discussion sections. Of those with posts, 426 completed
review forums. Based on which users replied to each other most         at least one class assignment. 638 students completed the online
often, they found communities of book and movie reviewers who          course and received a certificate (meaning that some students
had similar tastes in these products. As in MOOC forums, users         could earn a certificate without participating in forums at all).
did not declare any explicit social relationships represented in the
system, but they could still be grouped by implicit connections.       In addition to the weekly assignments the students were sent
                                                                       a survey that was designed to assess their personal motivations
In the context of complex networks, a community structure is a         for enrolling in the course. This survey consisted of 3 sets
subgraph which is more densely connected internally than it is to      of questions: MOOC-specific motivational items; two PALS
the rest of the network. We chose to apply the Girvan-Newman           (Patterns of Adaptive Learning Survey) sub-scales [21], Aca-
edge-betweenness algorithm (GN) [13]. This algorithm takes as          demic Efficacy and Mastery-Goal Orientation; and an item
input a weighted graph and a target number of communities.             focused on confidence in course completion. It was distributed
It then ranks the edges in the graph by their edge-betweenness         to students through the course’s E-mail messaging system to
value and removes the highest ranking edge. To calculate Edge-         students who enrolled in the course prior to the official start
betweenness we identify the shortest path p(a,b) between each          date. Data on whether participants successfully completed the
pair of nodes a and b in the graph. The edge-betweenness               course was downloaded from the same course system after the
of an arc is defined as the number of shortest paths that it           course concluded. The survey received 2,792 responses; 38% of
participates in. This is one of the centrality measures explored       the participants were female and 62% of the participants were
by Kovanovic et al. above [15]. The algorithm then recalcu-            male. All of the respondents were over 18 years of age.
lates the edge-betweenness values and iterates until the desired
number of disjoint community subgraphs has been produced.              The MOOC-specific items consisted of 10 questions drawn from
Thus the algorithm operates by iteratively finding and removing        previous MOOC research studies (cf. [2, 22]) asking respondents
the highest-value communications channel between communities           to rate their reasons for enrollment. These 10 items address
until the graph is fully segmented. For this analysis, we used         traits of MOOCs as a novel online learning platform. Specifically,
the iGraph library [7] implementation of G-N within R [24].            these 10 items included questions on both the learning content
                                                                       and features of MOOCs as a new platform. Two PALS Survey
The strength of a candidate community can be estimated by              scales [21] measuring mastery-goal orientation and academic
modularity. The modularity score of a given subgraph is defined        efficacy were used to study standard motivational constructs.
as a ratio of its intra-connectedness (edges within the subgraph)      PALS scales have been widely used to investigate the relation
to the inter-connectedness with the rest of the graph minus the        between a learning environment and a student’s motivation (cf.
fraction of such edges expected if they were distributed at ran-       [6, 20, 28]). Altogether ten items with five under each scale
dom [13, 35]. A graph with a high modularity score represents          were included. The participants were asked to select a number
a dense sub-community within the graph.                                from 1 to 5 with 1 meaning least relevant and 5 most relevant.
                                                                       Respondents were also asked to self-rate their confidence on a
3.    DATA SET                                                         scale of 1 to 10 as to whether they could complete the course
                                                                       according to the pace set by the course instructor. All three
This study used data collected from the “Big Data in Education”
                                                                       groups of items were domain-general.
MOOC hosted on the Coursera platform as one of the inaugural
courses offered by Columbia University [32]. It was created in
response to the increasing interest in the learning sciences and       4.   METHODS
educational technology communities in using EDM methods                For our analysis, we extracted a social network from the online
with fine-grained log data. The overall goal of this course was        forum associated with the course. We assigned a node to each
to enable students to apply each method to answer education            student, instructor, or TA in the course who added to it. Nodes
research questions and to drive intervention and improvement in        representing students were labeled with their final course grade
educational software and systems. The course covered roughly           out of 100 points. The Coursera forums operate as standard
threaded forums. Course participants could start a new thread         the network would only share edges with vertices of different
with an initial post, add a post to an existing thread, and add       scores. Thus grade assortativity allows us to measure whether
a comment or child element below an existing post. We added           individuals are not just connected directly to individuals with
a directed edge from the author of each post or comment to the        similar scores but whether they correlate with individuals who
parent post and to all posts or comments that preceded it on          are one step removed.
the thread based upon their timestamp. We made a conscious
decision to omit the textual content of the replies with the goal     Several commonly studied classes of networks tend to have pat-
of isolating the impact of the structure alone.                       terns in their assortativity. Social networks tend to have high
                                                                      assortativity, while biological and technological networks tend
We thus treat each reply or followup in the graph as an implicit      to have negative values (dissortativity) [23]. In a homogeneous
social connection and thus a possible relationship. Such implicit     course or one where students only form stratified communities
social relationships have been explored in the context of recom-      we would expect the assortativity to be very high while in a het-
mender systems to detect strong communities of researchers [5].       erogeneous class with no distinct communities we would expect
This is, by design, a permissive definition that is based upon        it to be quite low.
the assumption that individuals generally add to a thread after
viewing the prior content within it and that individual threads       4.2    Community Detection
can be treated as group conversations with each reply being a         The process of community detection we employed is briefly de-
conscious statement for everyone who has already spoken. The          scribed here [3]. As noted there we elected to ignore the edge
resulting network forms a multigraph with each edge represent-        direction when making our graph. Our goal in doing so was to
ing a single implicit social interaction. We removed self loops       focus on communities of learners who shared the same threads,
from this graph as they indicate general forum activity but           even when they were not directly replying to one-another. We
not any meaningful interaction with another person. We also           believe this to be a reasonable assumption given the role of class
removed vertices with a degree of 0, and collapsed the parallel       forums as a knowledge-building environment in which students
edges to form a simple weighted graph for analysis.                   exchange information with the group. Individuals who partic-
                                                                      ipate in a thread generally review prior posts before submitting
In the analyses below we will focus on isolating student perfor-      their contribution and are likely to return to view the followups.
mance and assessing the impact of the faculty and hub students.       Homogeneity in this context would mean that students gathered
We will therefore consider four classes of graphs: ALL the com-       and communicated primarily with equally-performing peers and
plete graph; Student the graph with the instructor and TAs            thus that they did not consistently draw from better-performing
removed; NoHub the graph with the instructor and hub users re-        classmates and help lower-performing ones or that the at-will
moved; and Survey which includes only students who completed          communities served to homogenize performance, with the stu-
the motivation survey. We will also consider versions of the above    dents in a given cluster evening out over time.
graphs without students who obtained a score of 0, and without
the isolated individuals who connect with at most one other           While algorithms such as GN are useful for finding clusters they
person. As we will discuss below, a number of students received       do not, in and of themselves, determine the right number of
a zero grade in the course. Because this is an at-will course, how-   communities. Rather, when given a target number they will seek
ever, we cannot readily determine why these scores were obtained.     to identify the best possible set of communities. In some imple-
They may reflect a lack of engagement with the course, differen-      mentations the algorithm can be applied to iteratively select the
tial motivations for taking the course, a desire to see the course    maximum modularity value over a possible range. Determining
materials without assignments, or genuinely poor performance.         the correct number of communities to detect, however, is a
                                                                      non-trivial task especially in large and densely connected graphs
4.1    Best-Friend Regression & Assortativity                         where changes to smaller communities will have comparatively
Fire et al. [11] applied a similar social network approach to         small effects on the global modularity score. As a consequence
traditional classrooms and found a correlation between a stu-         we cannot simply optimize for the best modularity score as we
dent’s most highly connected neighbor (”best friend”) and the         would risk missing small but important communities [12].
student’s grade. The links in that graph included cooperation
on assignments as well as partnership on group assignments.           Therefore, rather than select the clusterings based solely on
To examine whether the same correlation existed in a massive          the highest modularity, we have opted to estimate the correct
online course in which students were less likely to know each         number of clusters visually. To that end we plotted a series of
other beforehand and there were no group assignments, we              modularity curves over the set of graphs. For each graph G we
calculated each student’s best friend in the same manner and          applied the GN algorithm iteratively to produce all clusters in
performed a similar correlation.                                      the range (2,|GN |). For each clustering, we then calculated the
                                                                      global modularity score. We examined the resulting scores to
The simple best friends analysis gives a straightforward mech-        identify a crest where the modularity gain leveled off or began to
anism for correlating individual students. However it is also         decrease thus indicating that future subdivisions added no mean-
worthwhile to ask about students who are one-step removed             ingful information or created schisms in existing high-quality
from their peers. Therefore we will also calculate the grade          communities. This is a necessarily heuristic process that is sim-
assortativity (rG ) of the graphs. Assortativity describes the cor-   ilar to the use of Scree plots in Exploratory Factor Analysis [4].
relation of values between vertices and their neighbors [23]. The     We define the number identified as the natural cluster number.
assortativity metric r ranges between -1 and 1, and is essentially
the Pearson correlation between vertex and their neighbors [23].      5. RESULTS AND DISCUSSION
A network with r =1 would have each vertex only sharing edges         Before removing self-loops and collapsing the edges, the network
with vertices of the same score. Likewise, if r =−1 vertices in       contained 754 nodes and 49,896 edges. The final social network
contained 754 nodes and 17,004 edges. 751 of the participants
were students, with 1 instructor and 2 TAs. One individual was
incorrectly labeled as a student when they were acting as the
Chief Community TA. Since this person’s posts clearly indicated
that he or she was acting in a TA capacity with regard to the
forums, we relabeled him/her as a TA. Of the 751 students 304
obtained a zero grade in the course leaving 447 nonzero students.
215 of the 751 students responded to the motivation survey.

There were a total of 55,179 registered users, so the set of 754
forum participants is a small fraction of the entire course audi-
ence. However, forum users are not necessarily those who will
make an effort or succeed in the course. Forum users did not all
participate in the course, and some students who participated in
the course did not use the forums: 1,381 students in the course
got a grade greater than 0, and 934 of those did not post or
comment on the forums, while 304 of the 751 students who did           Figure 1: Modularity for each number of clusters,
participate in the forums received a grade of 0. Clearly students      including students with zeros.
who go to the trouble of posting forum content are in some
respect making an effort in the course beyond those who don’t,
but this does not necessarily correspond to course success.

5.1    Best-Friend Regression & Assortativity
We followed Fire et al.’s methodology for identifying Best Friends
in a weighted graph and calculated a simple linear regression
over the pairs. This correlation did not include the instructor or
TAs in the analysis. We calculated the correlation between the
students’ grades to their best friends’ grades in the set using
Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari-
ables were strongly correlated, ρ(748)=0.44, p<0.001. However,
the correlation was also affected by the dense clusters of students
with 0 grades. After removing the 0 grade students we found
an additional moderate correlation, ρ(444)=0.29, p<0.001.

Thus the significant correlation between best-friend grade and
grade holds over the transition from the traditional classroom to
                                                                       Figure 2: Modularity for each number of clusters,
a MOOC. This suggests that students in a MOOC, excluding the
                                                                       excluding students with zeros.
many who drop out or do not submit assignments, behave sim-
ilarly to those in a traditional classroom in this respect. These
results are also consistent with our calculations for assortativity.   5.2    Community Structure
There we found a small assortative trend for the grades as shown       The modularity curves for the graphs both with and without
in Table 1. These values reflect that a student was frequently         zero-score students are shown in Figures 1 and 2. We exam-
communicating with students who in turn communicated with              ined these plots to select the natural cluster numbers which are
students at a similar performance level. This in turn supports our     shown in Table 2. As the values illustrate the instructor, TAs,
belief that homogeneous communities may be found. As Table             and hub students have a disproportionate impact on the graph
1 also illustrates, the zero-score students contribute substan-        structure. The largest hub student in our graph connects to
tially to the assortativity correlation as well with the correlation   444 out of 447 students in the network. The graph with all
dropping by as much as a third when they were removed.                 users had lower modularity and required more clusters than the
                                                                       graphs with only students or only non-hubs (see Table 2), with



                                                                       Table 2: Graph sizes and natural number of clusters
                                                                       for each graph.
  Table 1: The grade assortativity for each network.                           Users    Zeros  V     E    Clusters
          Users    Zeros V       E       rG
                                                                                All           Yes   754    17004         212
            All            Yes    754    17004    0.29                          All           No    447     5678         173
            All            No     447     5678    0.20                          Students      Yes   751    15989         184
            Students       Yes    751    15989    0.32                          Students      No    447     5678         169
            Students       No     447     5678    0.20                          Non-Hub       Yes   716     9441          79
            Non-Hub        Yes    716     9441    0.37                          Non-Hub       No    422     3119          52
            Non-Hub        No     422     3119    0.24                          Survey        Yes   215     1679          58
Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left)
and without (right) hub-students, with each vertex representing a student and grade represented as color.


the non-hub graph having the highest modularity. This suggests
that non-hub students formed more isolated communities, while          Table 3: Grade statistics by community, selected
teaching staff and hubs communicated across these communities          to show examples of more and less homogeneous
and connected them.                                                    communities.
                                                                            Members Average Grade Standard Deviation
This largely consistent with the intent of the forums and the                      118              21.62                   36.58
active role played by the instructor and TAs in monitoring and                      41              22.00                   32.45
replying to all relevant posts in the forums. It is particularly in-                34              25.41                   40.44
teresting how closely the curves for the ALL and Student graphs                     31              56.13                   47.69
mirror one another. This may indicate that the hub students are                     20              49.05                   45.64
also those that followed the instructor and TAs closely, thus giv-                  16              12.44                   31.13
ing them isomorphic relationships, or it may indicate that they                     14              88.43                   22.47
are more connected than even the instructors and thus came to                       12              96.08                    6.36
bind the forums together on their own. This impact is further                       11              96.45                    7.38
illustrated by the cluster plots shown in Figure 3. Here the ab-                     4               3.00                    6.00
sence of the hub students results in a noticeable thinning of the                    4               8.50                    9.81
graph which in turn highlights the frequency of communication                        4               4.25                    8.50
that can be attributed to this, comparatively small, group.                          4              96.25                    3.50

The difference between the full plots and those with zero values
are also notable as the zero grade students were clearly a major
                                                                       standard deviation for a small selection of the communities in
factor in community formation. A direct examination of the
                                                                       the ALL reply network including zero-grades, hub students,
user graph showed that many of the zero students were only
                                                                       and teaching staff. Several of the communities, particularly
connected to other zero students or were not connected at all.
                                                                       the larger ones, do show a blend of good and poor students,
This is also highlighted in Figure 3. In both graphs the bulk of
                                                                       with a high standard deviation. However many if not most of
the zero score students are clustered in a tight network of com-
                                                                       the communities are more homogeneous with good and poor
munities on the left-hand side. That super-community consists
                                                                       students sharing a community with similarly-performing peers.
primarily of zero score students communicating with other zero-
                                                                       These clusters have markedly lower standard deviation.
score students, a structure we have nick-named the ‘deathball.’
                                                                       An examination of the grade distribution for each of the clusters
5.3 Student Performance & Motivation                                   showed that the scores within each cluster were non-normal.
As the color coding in Figure 3 illustrates, the students did          Therefore we opted to apply the Kruskal-Wallis (KW) test to
cluster by performance. Table 3 shows the average grade and            assess the correlation between cluster membership and perfor-
                                                                     We also found that community membership was not a significant
Table 4: Kruskal-Wallis test of student grade by                     predictor of whether students would complete the motivation
community, for each graph.                                           survey or of students’ motivations. We were surprised by the
      Users    Zeros Chi-Squared df  p-value                         fact that even when we focused solely on individuals who had
      All            Yes       349.0273    211    < 0.005            completed the survey, the students did not connect by stated
      All            No        216.1534    172     < 0.02            goals. This suggests to us that the students are more likely
      Students       Yes       202.0814     78    < 0.005            coalescing around the pragmatic needs of the class or conceptual
      Students       No        80.93076     51    < 0.005            challenges rather than on the winding paths that brought them
      Non-Hub        Yes       309.8525    183    < 0.005            there. One limitation of this work is that by relying on the
      Non-Hub        No        218.9603    168     < 0.01            forum data we were focused solely on the comparatively small
      Survey         Yes       99.99840    577    < 0.005            proportion of enrolled students (6%) who actively participated
                                                                     in the forums. This group is, by definition a smaller set of more
                                                                     actively-involved participants.
mance. The KW test is a nonparametric rank-based analogue
                                                                     In addition to addressing our primary questions this study also
to the common Analysis of Variance [17]. Here we tested grade
                                                                     raised a number of open issues for further exploration. Firstly,
by community number with the community being treated as a
                                                                     this work focused solely on the final course structure, grades, and
categorical variable. The results of this comparison are shown
                                                                     motivations. We have not yet addressed whether these commu-
in Table 4. As that illustrates, cluster membership was a sig-
                                                                     nities are stable over time or how they might change as students
nificant predictor of student performance for all of the graphs
                                                                     drop in our out. Secondly, while we ruled out motivations as a
with the non-zero graphs having markedly lower p-values than
                                                                     basis for the community this work we were not able to identify
those with zero students included. These results are consistent
                                                                     what mechanisms do support the communities. And finally this
with our hypothesis that students would form clusters of equal-
                                                                     study raises the question of generality and whether or not these
performers and we find that those results hold even when the
                                                                     results can be applied to MOOCs offered on different topics or
highly-connected instructors, TAs and hub students are included.
                                                                     whether the results apply to traditional and blended courses.
We performed a similar KW analysis for the questions on the
                                                                     In subsequent studies we plan to examine both the evolution of
motivation survey and for a binary variable indicating whether
                                                                     the networks over time as well as additional demographic data
or not the student completed the survey at all. For this analysis
                                                                     with the goal of assessing both the stability of these networks
we evaluated the clusters on all of the graphs. We found no
                                                                     and the role of other potential latent factors. We will also
significant relationship between the community structure on
                                                                     examine other potential clustering mechanisms that control for
any of the graphs and the survey question results or the survey
                                                                     other user features such as frequency of involvement and thread
completion variable. Thus while the clusters may be driven by
                                                                     structure. We also plan to examine other similar datasets to
separate factors they are not reflected in the survey content.
                                                                     determine if these features transition across classes and class
                                                                     types. We believe that these results may change somewhat once
6. CONCLUSIONS AND FUTURE WORK                                       students can coordinate face to face far more easily than online.
Our goal in this paper was to expand upon our prior community
detection work with the goal of aligning that work with prior
research on peer impacts, notably the work of Fire et al. [11].      7.   ACKNOWLEDGMENTS
We also sought to examine the impact of hub students and             This work was supported by NSF grant #1418269: “Modeling
student motivations on our prior results.                            Social Interaction & Performance in STEM Learning” Yoav
                                                                     Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes
To that end we performed a novel community clustering analysis       Co-PIs.
of student performance data and forum communications taken
from a single well-structured MOOC. As part of this analysis we      8.   REFERENCES
described a novel heuristic method for selecting natural numbers      [1] R. Azen and D. Budescu. The dominance
of clusters, and replicated the results of prior studies of both          analysis approach for comparing predictors in multiple
immediate neighbors and second-order assortativity.                       regression. Psychological Methods, 8(2):129–48, 2003.
                                                                      [2] Y. Belanger and J. Thornton.
Consistent with prior work, we found that students’ grades                Bioelectricity: A quantitative approach Duke University’s
were significantly correlated with their most closely associated          first MOOC. Journal of Learning Analytics, 2013.
peers in the new networks. We also found that this correlation        [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes,
extended out to their second-order neighborhood. This is consis-          R. Baker, Y. Bergner, and D. McNamara. Good
tent with our prior work showing that students form stable user           communities and bad communities: Does membership
communities that are homogeneous by performance. We found                 affect performance? In C. Romero and M. Pechenizkiy,
that those results were stable even if instructors, hub players,          editors, Proceedings of the 8th International
students with 0 scores, and students who did not fill out the sur-        Conference on Educational Data Mining, 2015. submitted.
vey were removed from consideration. This suggests that either
                                                                      [4] R. B. Cattell. The scree test for the number of factors.
the students are forming communities that are homogeneous or
                                                                          Multivariate Behavioral Research, 1(2):245–276, 1966.
that the effect of those individual and network features on the
communities and on performance is minimal.                            [5] E. Choo, T. Yu, M. Chi, and
                                                                          Y. Sun. Revealing and incorporating implicit communities
                                                                          to improve recommender systems. In M. Babaioff,
                                                                          V. Conitzer, and D. Easley, editors, ACM Conference
     on Economics and Computation, EC ’14, Stanford,                    structure, student motivation, and academic achievement.
     CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014.                Annual Review of Psychology, 57:487–503, 2006.
 [6] K. Clayton, F. Blumberg,                                      [21] C. Midgley, M. L.
     and D. P. Auld. The relationship between motivation                Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E.
     learning strategies and choice of environment whether              Freeman. Manual for the Patterns of Adaptive Learning
     traditional or including an online component. British              Scales (PALS). University of Michigan, Ann Arbor, 2000.
     Journal of Educational Technology, 41(3):349–364, 2010.       [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh
 [7] G. Csardi and T. Nepusz.                                           2013 - report #1. Journal of Learning Analytics, 2013.
     The igraph software package for complex network               [23] M. E. Newman. Assortative Mixing in Networks.
     research. InterJournal, Complex Systems:1695, 2006.                Physical Review Letters, 89(20):208701, Oct. 2002.
 [8] S. Dawson. ’Seeing’ the learning                              [24] R Core Team. R: A Language
     community: An exploration of the development of a                  and Environment for Statistical Computing. R Foundation
     resource for monitoring online student networking. British         for Statistical Computing, Vienna, Austria, 2012.
     Journal of Educational Technology, 41(5):736–752, 2010.       [25] B. Rienties, P. Alcott, and D. Jindal-Snape.
 [9] J. Eckles and E. Stradley. A                                       To let students self-select or not: That is the
     social network analysis of student retention using archival        question for teachers of culturally diverse groups. Journal
     data. Social Psychology of Education, 15(2):165–180, 2012.         of Studies in International Education, 18(1):64–83, 2014.
[10] A. Fini. The technological                                    [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you
     dimension of a massive open online course: The                     know, it’s who you know: Testing a model of the relative
     case of the CCK08 course tools. The International Review           importance of social networks to academic performance.
     Of Research In Open And Distance Learning, 10(5), 2009.            Social Psychology of Education, 12(2):175–189, 2009.
[11] M. Fire,                                                      [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick,
     G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting         P. Goldman, and J. Sherer. Social factors that contribute to
     student exam’s scores by analyzing social network data. In         attrition in MOOCs. In Proc. of the first ACM conference
     Active Media Technology, pages 584–595. Springer, 2012.            on Learning@ scale conference, pages 197–198. ACM, 2014.
[12] S. Fortunato and M. Barthélemy.                              [28] A. M. Ryan and H. Patrick. The classroom
     Resolution limit in community detection. Proc.                     social environment and changes in adolescents’ motivation
     of the National Academy of Sciences, 104(1):36–41, 2007.           and engagement during middle school. American
[13] M. Girvan and M. E. J. Newman. Community structure                 Educational Research Journal, 38(2):437–460, 2001.
     in social and biological networks. Proc. of the National      [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and
     Academy of Sciences, 99(12):7821–7826, June 2002.                  D. Pritchard. Who does what in a massive open online
[14] J. Huang, A. Dasgupta, A. Ghosh,                                   course? Communications of the ACM, 57(4):58–65, 2014.
     J. Manning, and M. Sanders. Superposter behavior              [30] G. Stahl, T. Koschmann,
     in MOOC forums. In Proc. of the first ACM conference               and D. Suthers. Computer-supported collaborative
     on Learning@ scale conference, pages 117–126. ACM, 2014.           learning: An historical perspective. Cambridge
[15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala.            handbook of the learning sciences, 2006:409–426, 2006.
     What is the source of social capital? the association         [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen.
     between social network position and social presence in             Interactive lectures in engineering education. European
     communities of inquiry. In S. G. Santos and O. C. Santos,          Journal of Engineering Education, 26(1):15–28, 2001.
     editors, Proc. of the Workshops held at Educational           [32] Y. Wang and R. Baker. Content or platform:
     Data Mining 2014, co-located with 7th International                Why do students complete MOOCs? MERLOT Journal
     Conference on Educational Data Mining (EDM                         of Online Learning and Teaching, 11(1):191–218, 2015.
     2014), London, United Kingdom, July 4-7, 2014., volume        [33] Y. Wang, L. Paquette, and R. Baker.
     1183 of CEUR Workshop Proc. CEUR-WS.org, 2014.                     A longitudinal study on learner career advancement
[16] A. D. I. Kramer, J. E. Guillory,                                   in MOOCs. Journal of Learning Analytics, 1(3), 2014.
     and J. T. Hancock. Experimental evidence of massive-scale     [34] Wikipedia. Spearman’s
     emotional contagion through social networks. Proc. of the          rank correlation coefficient — Wikipedia, the free
     National Academy of Sciences, 111(24):8788–8790, 2014.             encyclopedia, 2013. [Online; accessed 27-February-2013].
[17] W. H. Kruskal and W. A. Wallis. Use                           [35] Wikipedia. Modularity (networks) — Wikipedia, the free
     of ranks in one-criterion variance analysis. Journal of the        encyclopedia, 2014. [Online; accessed 5-February-2015].
     American statistical Association, 47(260):583–621, 1952.      [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on,
[18] N. Li, H. Verma, A. Skevi,                                         tune in, drop out: Anticipating student dropouts in massive
     G. Zufferey, J. Blom, and P. Dillenbourg. Watching                 open online courses. In Proc. of the 2013 NIPS Data-Driven
     MOOCs together: investigating co-located MOOC                      Education Workshop, volume 10, page 13, 2013.
     study groups. Distance Education, 35(2):217–233, 2014.        [37] W. W. Zachary. An information
[19] T. R. Liyanagunawardena, A. A. Adams, and S. A.                    flow model for conflict and fission in small groups.
     Williams. MOOCs: A systematic study of the published               Journal of Anthropological Research, 33:452–473, 1977.
     literature 2008-2012. The International Review of Research    [38] J. Zhang, M. S. Ackerman, and L. Adamic.
     in Open and Distributed Learning, 14(3):202–227, 2013.             Expertise networks in online communities: structure and
[20] J. L. Meece,                                                       algorithms. In Proc. of the 16th international conference
     E. M. Anderman, and L. H. Anderman. Classroom goal                 on World Wide Web, pages 221–230. ACM, 2007.
                         Using the Hint Factory to
                   Compare Model-Based Tutoring Systems

                                     Collin Lynch                   Thomas W. Price
                                 North Carolina State               North Carolina State
                                      University                         University
                                   890 Oval Drive                     890 Oval Drive
                                 Raleigh, NC 27695                  Raleigh, NC 27695
                                cflynch@ncsu.edu                   twprice@ncsu.edu
                                      Min Chi                        Tiffany Barnes
                                 North Carolina State               North Carolina State
                                      University                         University
                                   890 Oval Drive                     890 Oval Drive
                                 Raleigh, NC 27695                  Raleigh, NC 27695
                                  mchi@ncsu.edu                  tmbarnes@ncsu.edu

ABSTRACT                                                        Model-based tutoring systems are based upon classical ex-
Model-based tutoring systems are driven by an abstract do-      pert systems, which represent relevant domain knowledge
main model and solver that is used for solution validation      via static rule bases or sets of constraints [9]. These knowl-
and student guidance. Such models are robust but costly         edge bases are generally designed by domain experts or with
to produce and are not always adaptive to specific students’    their active involvement. They are then paired with classical
needs. Data-driven methods such as the Hint Factory are         search algorithms or heuristic satisfaction algorithms to au-
comparatively cheaper and can be used to generate indi-         tomatically solve domain problems, identify errors in student
vidualized hints without a complete domain model. In this       solutions, and to provide pedagogical guidance. The goal of
paper we explore the application of data-driven hint analy-     the design process is to produce expert models that give the
sis of the type used in the Hint Factory to existing model-     same procedural advice as a human expert. Classical model-
based systems. We present an analysis of two probability        based tutors have been quite successful in field trials, with
tutors Andes and Pyrenees. The former allows for flexible       systems such as the ACT Programming Tutor helping stu-
problem-solving while the latter scaffolds students’ solution   dents achieve almost two standard deviations higher than
path. We argue that the state-space analysis can be used to     those receiving conventional instruction [5].
better understand students’ problem-solving strategies and
can be used to highlight the impact of different design de-     Data-driven hint generation methods such as those used in
cisions. We also demonstrate the potential for data-driven      Hint Factory [17] take a different approach. Rather than us-
hint generation across systems.                                 ing a strong domain model to generate a-priori advice, data-
                                                                driven systems examine prior student solution attempts to
                                                                identify likely paths and common errors. This prior data
1.   INTRODUCTION                                               can then be used to provide guidance by directing students
Developers of model-based tutoring systems draw on do-          towards successful paths and away from likely pitfalls. In
main experts to develop ideal models for student guidance.      contrast to the expert systems approach, these models are
Studies of such systems have traditionally been focused on      primarily guided not by what experts consider to be ideal
their overall impact on students’ performance and not on the    but by what students do.
students’ user-system interaction. The Hint Factory, by con-
trast, takes a data-driven approach to extract advice based     Model-based systems such as Andes [18] are advantageous
upon students’ problem solving paths. In this paper we will     as they can provide appropriate procedural guidance to stu-
apply the Hint Factory analytically to evaluate the impact      dents at any point in the process. Such models can also be
of user interface changes and solution constraints between      designed to reinforce key meta-cognitive concepts and ex-
two closely-related tutoring systems for probability.           plicit solution strategies [4]. They can also scale up rapidly
                                                                to include new problems or even new domain concepts which
                                                                can be incorporated into the existing system and will be
                                                                available to all future users. Rich domain models, however,
                                                                are comparatively expensive to construct and require the
                                                                long-term involvement of domain experts to design and eval-
                                                                uate them.

                                                                Data-driven methods for generating feedback, by contrast,
                                                                require much lower initial investment and can readily adapt
to individual student behaviors. Systems such as the Hint
Factory are designed to extract solutions from prior student
data, to evaluate the quality of those solutions, and to com-
pile solution-specific hints [17]. While this avoids the need
for a strong domain model, it is limited to the space of so-
lutions explored by prior students. In order to incorporate
new problems or concepts it is necessary to collect additional
data. Additionally, such methods are not generally designed
to incorporate or reinforce higher-level solution strategies.

We believe that both of these approaches have inherent ad-
vantages and are not necessarily mutually exclusive. Our
goal in this paper is to explore what potential data-driven
methods have to inform and augment model-based systems.
We argue that data-driven methods can be used to: (1)
evaluate the differences between closely-related systems; (2)
assess the impact of specific design decisions made in those     Figure 1: The Andes user interface showing the
systems for user behaviors; and (3) evaluate the potential ap-   problem statement window with workspace on the
plication of data-driven hint generation across systems. To      upper left hand side, the variable and equation win-
that end we will survey relevant prior work on model-based       dows on the right hand side, and the dialogue win-
and data-driven tutoring. We will describe two closely-          dow on the lower left.
related tutoring systems and data collected from them. We
will then present a series of analyses using state-based meth-
ods and discuss the conclusions that we drew from them.
                                                                 is associated with a pre-compiled solution graph that defines
                                                                 the set of possible solutions and problem-solving steps. The
2.    BACKGROUND                                                 system uses a principle-driven automated problem solver to
                                                                 compile these graphs and to identify the complete solution
2.1   Model-Based Tutoring                                       paths. The solver is designed to implement the Target Vari-
Model-based tutoring systems take a classical expert-systems     able Strategy (TVS), a backward-chaining problem solving
approach to tutoring. They are typically based upon a            strategy that proceeds from a goal variable (in this case
strong domain model composed of declarative rules and facts      the answer to the problem) via principle applications to the
representing domain principles and problem-solving actions       given information. The TVS was designed with the help of
coupled with an automatic problem solver. This knowledge         domain experts and guides solvers to define basic solution
base is used to structure domain knowledge, define individ-      information (e.g. given variables) and then to proceed from
ual problems, evaluate candidate solutions, and to provide       the goal variable and use principles to define it in terms of
student guidance. Novices typically interact with the system     the given variables.
through problem solving with the system providing solution
validation, automatic feedback, pedagogical guidance, and        Students working with Andes use a multi-modal user inter-
additional problem-solving tasks. The Sherlock 2 system, for     face to write equations, define variables and engage in other
example, was designed to teach avionics technicians about        atomic problem-solving steps. A screenshot of the Andes UI
appropriate diagnostic procedures [11]. The system relies on     can be seen in Figure 1. Andes allows students to solve prob-
a domain model that represents the avionics devices being        lems flexibly, completing steps in any order so long as they
tested, the behavior of the test equipment, and rules about      are valid [20]. A step is considered to be valid if it matches
expert diagnostic methods. Sherlock 2 uses these models          one or more entries in the saved solution paths and all nec-
to pose dynamic challenges to problem solvers, to simulate       essary prerequisites have been completed. Invalid steps are
responses to their actions, and to provide solution guidance.    marked in red, but no other immediate feedback is given.
                                                                 Andes does not force students to delete or fix incorrect en-
Andes [19, 18, 20] and Pyrenees [4] are closely-related model-   tries as they do not affect the solution process. In addi-
driven ITSs in the domains of physics and probability. They      tion to validating entries, the Andes system also uses the
were originally developed at the University of Pittsburgh un-    precompiled solution graphs to provide procedural guidance
der the Direction of Dr. Kurt VanLehn. Like other model-         (next-step-help). When students request help, the system
based systems, they rely on a rule-based domain model and        will map their work to the saved solution paths. It will then
automatic problem solvers that treat the domain rules as         select the most complete solution and prompt them to work
problem-solving steps. They distinguish between higher-          on the next available step.
level domain concepts such as Bayes’ Rule, and atomic steps
such as variable definitions. Principles are defined by a cen-   One of the original goals of the Andes system was to de-
tral equation (e.g. p(A|B) = (p(B|A) ∗ p(A))/p(B)) and           velop a tutor that operated as an “intelligent worksheet.”
encapsulate a set of atomic problem-solving steps such as        The system was designed to give students the freedom to
writing the equation and defining the variables within it.       solve problems in any order and to apply their preferred
                                                                 solution strategy. The system extends this freedom by al-
The systems are designed to function as homework-helpers,        lowing invalid steps in an otherwise valid solution and by
with students logging into the system and being assigned or      allowing students to make additional correct steps that do
selecting one of a set of predefined problems. Each problem      not advance the solution state or are drawn from multiple
                                                                  state of a student’s partial solution at some point during the
                                                                  problem solving process, and each edge represents an action
                                                                  that takes the student from one state to another. A complete
                                                                  solution is represented as a path from the initial state to a
                                                                  goal state. Each state in the interaction network is assigned
                                                                  a weight via a value-iteration algorithm. A new student re-
                                                                  questing a hint is matched to a previously observed state and
                                                                  given context-sensitive advice. If, for example, the student is
                                                                  working on a problem that requires Bayes’ Rule and has al-
                                                                  ready defined p(A), p(B), and p(B|A) then the Hint Factory
                                                                  would first prompt them to consider defining p(A|B), then
                                                                  it would point them to Bayes Rule, before finally showing
                                                                  them the equation p(A|B) = (p(B|A) ∗ p(A))/p(B).

                                                                  These hints are incorporated into existing tutoring systems
Figure 2: The Pyrenees user interface showing the                 in the form of a lookup table that provides state-specific
problem statement at the top, the variable and equa-              advice. When a user asks for help the tutor will match their
tion lists on the left, and the tutor interaction win-            current state to an index state in the lookup table and will
dow with calculator on the lower right.                           prompt them to take the action that will lead them to the
                                                                  highest value neighboring state. If their current state is not
                                                                  found then the tutor will look for a known prior state or
solution paths. This was motivated in part by a desire to         will give up. The Hint Factory has been applied successfully
make the system work in many different educational con-           in a number of domains including logic proofs [17], data
texts where instructors have their own preferred methods          structures [8], and programming [15, 10, 13]. Researchers
[20]. The designers of Andes also consciously chose only to       have also explored other related methods for providing data-
provide advice upon demand when the students would be             driven hints. These include alternative state representations
most willing to accept it. For the students however, par-         [13], path construction algorithms [16, 14], and example-
ticularly those with poor problem-solving skills, this passive    based model-construction [12].
guidance and comparative freedom can be problematic as it
does not force them to adhere to a strategy.                      The primary goal of the Hint Factory is to leverage prior
                                                                  data to provide optimal state-specific advice. By calculating
This problem motivated the development of Pyrenees. Pyre-         advice on a per-state basis, the system is able to adapt to
nees, like Andes acts as a homework helper and supports stu-      students’ specific needs by taking into account both their
dents with on-demand procedural and remediation help. It          current state and the paths that they can take to reach the
uses an isomorphic domain model with the same principles,         goal. As a consequence the authors of the Hint Factory
basic steps, problems, and solution paths. Unlike Andes,          argue that this advice is more likely to be in the students’
however, Pyrenees forces students to applying the target-         Zone of Proximal Development and thus more responsive to
variable-strategy during problem solving. It also requires        their needs than a less-sensitive algorithm.
them to repair incorrect entries immediately before moving
on. Students are guided through the solution process with         3.    METHODS
a menu-driven interface, shown in Figure 2. At each step,         In order to investigate the application of data-driven meth-
the system asks students what they want to work on next           ods to model-based tutoring systems, we collected data from
and permits them to make any valid step that is consistent        two studies conducted with Andes and Pyrenees in the do-
with the TVS. Chi and VanLehn [3] conducted a study of the        main of probability. We then transformed these datasets
two systems and found that scaffolding the TVS in Pyrenees        into interaction networks, consisting of states linked with
helped to eliminate the gap between high and low learners.        actions. We used this representation to perform a variety of
This effect was observed both in the original domain where        quantitative and qualitative analyses with the goal of eval-
it was taught (in their case probability) and it transferred to   uating the differences between the two systems and the im-
a new domain (physics), where students used Andes alone.          pact of the specific design decisions that were made in each.

2.2    Data-Extraction and                                        3.1    The Andes and Pyrenees Datasets
       Data-Driven Tutoring.                                      The Andes dataset was drawn from an experiment con-
One of the longstanding goals of educational data-miners is       ducted at the University of Pittsburgh [3]. This study was
to support the development of data-driven tutoring systems.       designed to assess the differential impact of instruction in
Such systems use past student data to structure pedagogical       Andes and Pyrenees on students’ meta-cognitive and problem-
and domain knowledge, administer conceptual and pedagog-          solving skills. Participants in this study were college under-
ical advice, or evaluate student performance and needs. A         graduates who were required to have taken high-school level
number of attempts have been made to address these goals.         algebra and physics but not to have taken a course in prob-
One of the most successful data-driven systems is the Hint        ability or statistics. The participants were volunteers and
Factory [1, 2, 17]. The Hint Factory takes an MDP-based           were paid by time not performance.
approach to hint generation. It takes as input a set of prior
student logs for a given problem, represented as a network of     Forty-four students completed the entire study. However
interactions [6, 7]. Each vertex in this network represents the   for the purposes of the present analysis, we drew on all
66 students who completed at least one problem in Andes-         tions, this representation uniquely identifies any equation
Probability. This is consistent with prior uses of the Hint      for a given problem. Because we used the same state rep-
Factory which draw from all students including those who         resentation for both tutors, we were able to compare states
did not complete the problem. The Pyrenees-Probability           directly across tutors.
logs from this study were not used due to problems with the
data format that prevented us from completing our analysis.      Additionally, we opted to ignore incorrect entries. Pyrenees
From this dataset we drew 394 problem attempts covering          prevents students from applying the principles of probabil-
11 problems. The average number of steps required to solve       ity improperly and forces them to correct any mistakes made
the problems was 17.6. For each problem we analyzed be-          immediately therefore any errors in the student logs are im-
tween 25 and 72 problem attempts, with an average of 35.8        mediately removed making the paths uninformative. Andes,
attempts per problem. Some attempts were from the same           by contrast, gives students free reign when writing equations
student, with at most two successful attempts per student.       and making other entries. This freedom resulted in syntac-
Over all problems, 81.7% of the attempts were successful,        tic errors and improper rule application errors arising in our
with the remainder being incomplete attempts.                    dataset. The meaning of these invalid equations is inherently
                                                                 ambiguous and therefore difficult to incorporate into a state
The Pyrenees dataset was drawn from a study of 137 stu-          definition. However such errors are immediately flagged by
dents conducted in the 200-level Discrete Mathematics course     the system and may be ignored by the student without con-
in the Department of Computer Science at North Carolina          sequence as they do not affect the answer validity therefore
State University. This study used the same probability text-     they may be safely ignored as well.
book and pre-training materials as those used in the Andes
study. The students used Pyrenees as part of a homework          An edge, or action, in our network represents the correct
assignment, in which they completed 12 problems using the        application of a rule or a correct variable definition and leads
tutoring system. One of these problems was not represented       to a transition from one state to another. For the present
in the Andes dataset. We therefore excluded it from our          dataset and state representation, the possible actions were
analysis, leaving 11 shared problems.                            the definition or deletion of variables or equations. Each of
                                                                 these actions was possible in both tutors.
Unlike the Andes students, however, the Pyrenees students
were not always required to solve every problem. In this         4.    ANALYSIS
study the system was configured to randomly select some          In order to develop a broader understanding of our datasets,
problems or problem steps to present as worked examples          we first visualized the interaction network for each problem
rather than as steps to be completed. In order to ensure         as a weighted, directed graph. We included attempts from
that the results were equivalent we excluded the problem-        both Andes and Pyrenees in the network, and weighted the
level worked examples and any attempt with a step-level          edges and verticies by the frequency with which it appeared
worked example from our analysis. As a consequence, each         in the logs. We annotated each state and edge with the
problem included a different subset of these students. For       weight contributed by each tutor. Two examples of these
each problem we analyzed between 83 and 102 problem at-          graphs are given in Figure 3.
tempts, with an average of 90.8 attempts per problem. Some
attempts were from the same student, with at most one suc-       Throughout this section, we will use these graphs to address
cessful attempt per student. Over all problems, 83.4% of         the points we outlined at the end of Section 1. We begin
the attempts were successful.                                    with a case study from one problem and will explore the
                                                                 student problem solving strategies using our graph repre-
3.2   State and Action Representations                           sentation. We will then compare the Andes and Pyrenees
In order to compare the data from both tutors, we repre-         Systems with a variety of metrics based on this represen-
sented each problem as an interaction network, a represen-       tation. We will relate our observations back to the design
tation used originally in the Hint Factory [7]. In the net-      decisions of each system and identify evidence that may sup-
work a vertex, or state, represents the sum total of a stu-      port or question these decisions. Finally, we will show how
dents’ current problem solving steps at a given time during      the analysis methods associated with data-driven hint gen-
a problem-solving attempt. Because Andes permits flexible        eration can be used to validate some of these findings.
step ordering while Pyrenees does not, we chose to represent
the problem solving state st as the set of valid variables and   4.1    Case Study: Problem Ex242
equations defined by the student at time t.                      The graphical representation of a problem is very helpful for
                                                                 giving a high-level overview of a problem and performing
A variable is a probabilistic expression, such as P (A ∪ B),     qualitative analysis. Problem Ex242, shown in Figure 3,
that the student has identified as important to solving the      presents an interesting scenario for a number of reasons. The
problem, for which the probability is known or sought. An        problem was the 10th in a series of 12 practice problems, and
equation represents the application of a principle of proba-     asked the following:
bility, which relates the values of defined variables, such as
the Complement Theorem, P (A) + P (¬A) = 1. Because
such equations can be written in many algebraically equiv-             Events A, B and C are mutually exclusive and
alent ways, we represent each equation as a 2-tuple, con-              exhaustive events with p(A) = 0.2 and p(B) =
sisting of the set of variables included in the equation (e.g.         0.3. For an event D, we know p(D|A) = 0.04,
{P (A), P (¬A)}) and the principle being applied (e.g. Com-            p(D|B) = 0.03, and p(C|D) = 0.3. Determine
plement Theorem). Because we only represent valid equa-                p(B|D).
                                                               clude them as they represent a fair proportion of the Andes
                                                               data. For instance, 62 of the 126 Andes states for Ex242
                                                               were singleton states.

                                                               Interestingly, while there were small variations among their
                                                               solutions, all of the Andes students choose to apply Bayes’
                                                               Rule rather than relying solely on the Conditional Probabil-
                                                               ity Theorem as suggested by Pyrenees. This, coupled with
                                                               the strong proportion of Pyrenees students who also chose
                                                               the Bayes’ Rule solution, indicates that the solution offered
                                                               by Pyrenees may be unintuitive for students, especially if
                                                               they have recently learned Bayes’ Rule. Again, this can be
                                                               interpreted as evidence that Pyrenees’ strong guidance did
                                                               have an impact on students’ problem solving strategies, but
                                                               it also raises concerns about how reasonable this guidance
                                                               will appear to the students. Regardless of one’s interpreta-
                                                               tion, an awareness of a trend like this can help inform the
                                                               evolution of model-based tutors like Andes and Pyrenees.

                                                               4.2   Comparing datasets
Figure 3: A graph representation of problems                   We now turn to qualitatively comparing the datasets. While
Ex252a, left, and Ex242, right. States and edges are           it is not a common practice to directly compare data from
colored on a gradient from blue to yellow, indicating          different tutors, we argue that it is appropriate, especially
the number of students who reached that state in the           in this context. In longstanding tutoring projects it is com-
Andes and Pyrenees tutors respectively. Rounded                mon for developers and researchers to make many substan-
edges indicate that at least one student from both             tive changes. The Andes system itself has undergone sub-
tutors is present in a state. A green border indicates         stantial interface changes over the course of its development
a solution state, and a pink border indicates that a           [20]. These changes can alter student behavior in substan-
state is contained in the pedagogically “ideal” solu-          tial ways, and it is important for researchers to consider
tion. Edge thickness corresponds to the natural log            how they affect not just learning outcomes but also problem
of the number of problem attempts which included               solving strategies, as was investigated by Chi et al. [4].
the given edge.
                                                               In many respects the close relationship between Andes and
                                                               Pyrenees makes them analogous to different versions of the
The problem is notable in the Pyrenees dataset because it      same tutor and the presence of an isomorphic knowledge
was the only one in which the students were split almost       base and problem set makes it possible for us to draw mean-
evenly among two solution paths. For most of the problems      ingful comparisons between students. In this section we will
in the dataset the vast majority of students followed the      inspect how the scaffolding design decisions made when con-
optimal solution path with only a few finding alternatives.    structing the tutors affected the problem solving strategies
The ideal solution path, as suggested by Pyrenees’ domain      exhibited by the students.
model, employed repeated applications of the Conditional
Probability Theorem: P (A ∩ B) = P (A|B)P (B), which the       A visual inspection of the state graphs for each problem
problem was designed to teach. The students had been pre-      revealed significant portions of each graph were shared be-
viously exposed to Bayes’ Theorem however, and over half       tween the two datasets and portions that were represented
of them chose to apply it instead. This allowed them to        in only one of the two. Despite the fact that students using
circumvent one variable definition and two applications of     Andes were capable of reaching any of the states available
the Conditional Probability Theorem, achieving a slightly      to students in Pyrenees, many Pyrenees states were never
shorter solution path. We make no argument here which          discovered by Andes students. As noted in Section 4.1, this
path the tutor should encourage students to take, but it is    suggests that guidance from the Pyrenees tutor is successful
worth noting that the Hint Factory, when trained on the        in leading students down solution paths that they would not
Pyrenees data for this problem, recommends the shorter,        otherwise have discovered, possibly applying skills that they
more popular path.                                             would otherwise not have used.

The Andes dataset gives us a very different set of insights    To quantify these findings, we calculated the relative sim-
into this problem. Because Andes lacks the strong pro-         ilarity of students in each tutor. For a given problem, we
cess scaffolding of Pyrenees, students were able to make a     defined the state-similarity between datasets A and B as the
wider variety of choices, leading to a graph with many more,   probability that a randomly selected state from a student in
less populous states. While almost every state and edge in     A will be passed through by a randomly selected student
the Pyrenees graph represents multiple students, the An-       in B. Recall from Section 3.2 that our state representation
des graph contains a number of paths, including solutions,     allows us to directly compare states across tutors. By this
that were reached by only one student. In some state-based     definition, the self-similarity of a dataset is a measure of
analyses the authors choose to omit these singleton states,    how closely its students overlap each other while the cross-
for instance when generating hints. We have chosen to in-      similarity is a measure of how closely its students overlap
  States         Andes           Pyrenees         Solution           States        Andes           Pyrenees        Solution
  Andes       0.551 (0.134)    0.494 (0.153)    0.478 (0.186)        Andes      0.419 (0.150)    0.327 (0.139)   0.253 (0.172)
 Pyrenees     0.460 (0.141)    0.688 (0.106)    0.669 (0.146)       Pyrenees    0.372 (0.193)    0.709 (0.145)   0.601 (0.214)
 Actions         Andes           Pyrenees         Solution          Actions        Andes           Pyrenees        Solution
  Andes       0.878 (0.085)    0.874 (0.118)    0.851 (0.140)        Andes      0.818 (0.151)    0.582 (0.168)   0.636 (0.205)
 Pyrenees     0.828 (0.117)    0.936 (0.021)    0.923 (0.036)       Pyrenees    0.678 (0.212)    0.899 (0.038)   0.879 (0.055)

Table 1: Pairwise similarity across tutors and the                 Table 2: Pairwise similarity across tutors and the
ideal solution path. Similarities were calculated for              ideal solution path calculated using a variable-free
each problem, and each cell lists the mean (and stan-              state representation. Rows and columns are the
dard deviation) over all problems. The top half cov-               same as in Table 1.
ers the state similarity metrics while the bottom half
of each table covers action similarity.                                        Problem        Andes      Pyrenees
                                                                                 ex132     26 (47.5%)    2 (98.7%)
                                                                                ex132a     13 (33.3%)    1 (100.0%)
with the other dataset. Similarity measures for the datasets                     ex144      2 (96.6%)    1 (100.0%)
can be found at the top of Table 1.                                              ex152      11 (0.0%)     4 (0.0%)
                                                                                ex152a      8 (59.0%)    3 (97.4%)
Predictably, both datasets have higher self-similarity than                     ex152b      12 (0.0%)    1 (100.0%)
cross-similarity, with Pyrenees showing higher self-similarity                   ex212      8 (71.4%)    1 (100.0%)
than Andes. This indicates that Pyrenees students chose                          ex242       9 (0.0%)    2 (49.38%)
more homogeneous paths to the goal. This is reasonable                           ex252      7 (76.9%)    2 (98.4%)
and consistent with the heavy scaffolding that is built into                    ex252a      4 (81.8%)    2 (98.6%)
the system. It is important to note that our similarity met-                    exc137      19 (0.0%)    2 (98.75%)
rics are not symmetric. The cross-similarity of Pyrenees
with Andes is higher than the reverse. This indicates that         Table 3: For each problem, the tables gives the num-
the path taken by Pyrenees students are more likely to have        ber of unique solution states represented in each tu-
been observed by Andes students than vice-versa. This has          tor’s dataset. Note that there may exist many solu-
important implications for designers who are interested in         tion paths which reach a given solution. The follow-
collecting data from a system that is undergoing modifica-         ing percent (in parentheses) represents the percent
tions. If a system becomes increasingly scaffolded and re-         solution paths that ended in the pedagogically ideal
strictive over time, past data will remain more relevant than      solution.
in a system that is relaxed. In many ways this simply re-
flects the intuition that allowing students to explore a state
space more fully will produce more broadly useful data, and        with the design goals of Pyrenees which was set up to guide
restricting students will produce data that is more narrowly       students along the otherwise unfamiliar path of the TVS.
useful. Note that here we are only observing trends, and we
make no claims of statistical significance.                        We also opted to examine the impact of the variable defini-
                                                                   tions on our evaluation. As noted above, variable definitions
In our analysis, we found that, within both datasets, many         are an atomic action. They do not depend upon any event
solution paths or sub-paths differed only in the order that        assertion and thus have no ordering constraints unlike the
actions were performed. In our domain, many actions do             principles. We did so with the hypothesis that this would
not have ordering constraints. It is possible, for example,        increase the similarity metrics for the datasets by eliminat-
to define the variables A and B in either order, and the re-       ing the least constrained decisions from consideration. Our
sulting solution paths would deviate from one another. We          results are shown in Table 2 below. Contrary to our ex-
thus sought to determine how much of the observed differ-          pectations, this actually reduced the similarity both within
ence between our two datasets was due to these ordering            and across the datasets, with the exception of Pyrenees’ self-
effects. To that end we define the action-similarity between       similarity. Thus the unconstrained variable definitions did
datasets A and P as the probability that a randomly se-            not substantially contribute to the dissimilarity. Rather,
lected action performed by a student in A will have been           most of the variation lay in the order of principle applica-
performed by a by a randomly selected student in B. These          tions.
values are shown in the bottom of Table 1, and each of the
trends observed for state-similarity hold, with predictably        4.3   Similarity to an “ideal" solution
higher similarity values.                                          We now turn to exploring how well the ideal solution was
                                                                   represented in the datasets. For both tutors the ideal solu-
It is notable that the similarity between Pyrenees and Andes       tion is the pedagogically-desirable path constructed via the
is almost as high as Andes’ self-similarity, indicating that the   TVS. Our measure of cross-similarity between two datasets
actions taken by Pyrenees students are almost as likely to         can also be applied between a single solution path and a
be observed in Andes students as Pyrenees students. This           dataset by treating the single solution as a set of one. We
suggests that, for the most part, the Andes students per-          can thus measure the average likelihood of an ideal solution
formed a superset of the actions performed by the Pyrenees         state appearing in a student’s solution from each dataset.
students. Thus the impact of Pyrenees is most visible in the       The results of this calculation are shown in tables 1 and 2,
order of execution, not the actions chosen. This is consistent     using both the state- and action-similarities explained ear-
lier. Predictably, the solution has a high similarity with
Pyrenees students, as these students are scaffolded tightly
and offered few chances to deviate from the path.

As Table 3 shows, Pyrenees students were funneled almost
exclusively to the ideal solution on the majority of prob-
lems, even if their paths to the solution were variable. We
found only one problem, Ex152, where the Pyrenees stu-
dents missed the ideal path. That was traced to a pro-
gramming error that forced students along a similar path.
Otherwise, there was only one problem, Ex242 (discussed
in Section 4.1), where a meaningful percentage of students
chose a different solution. The Andes students, by contrast,
were much less likely to finish in the ideal solution state, but
this was also problem-dependent.

4.4    Applications of the Hint Factory
Finally, having shown that the datasets differ, and that these
                                                                   Figure 4: The four Cold Start curves, averaged
differences are consistent with the differing design choices of
                                                                   across all problems. The x-axis shows the number of
the two tutors, we sought to determine what effect those
                                                                   students used to train the model, and the y-access
differences would have on data-driven hint generation. Our
                                                                   shows the percentage of a new student’s path that
goal was to determine how applicable a hint model of the
                                                                   has available hints. The curve labeled “XvY” indi-
type produced by the Hint Factory would be for one dataset
                                                                   cates training on the X dataset and selecting a new
if it was trained on another. To that end we performed a
                                                                   student from the Y dataset (A = Andes; P = Pyre-
modified version of the Cold Start Experiment [1], which
                                                                   nees).
is designed to measure the number of state-specific hints
that Hint Factory can provide given a randomly selected
dataset. The Cold Start experiment functions like leave-
one-out cross-validation for state-based hint generation. In       system in the same domain. Clearly a substantive interface
the original Cold Start experiment, one student was selected       and scaffolding change of the type made in Pyrenees can
at random and removed from the dataset, to represent a             change the state space sufficiently that we cannot trivially
“new” student using the tutor. Each remaining student in           rely on our prior data. On the other hand, while the cross-
the dataset was then added, one at a time, in a random             application of data does have upper limits, those limits are
order to the Hint Factory’s model. On each iteration, the          comparatively high. Clearly data from a prior system can be
model is updated and the percentage of states on the ‘new’         reused and can serve as a reliable baseline for novel system,
student’s path for which a hint is available is calculated.        with the caveat that additional exploratory data is required.
This is repeated a desired number of times with new students
to account for ordering effects.
                                                                   5.   DISCUSSION AND CONCLUSION
For the present study we calculated cold-start curves for          Our goal in this paper was to evaluate the application of
both the Pyrenees and Andes datasets. We also calculated           data-driven methods such as the Hint Factory to model-
curves using the opposing dataset to illustrate the growth         based tutoring systems. To that end we analyzed and com-
rate for cross-tutor hints. For these modified curves we se-       pared datasets collected from two closely-related tutoring
lected the hint-generating students from the opposing dataset.     systems: Andes and Pyrenees. Through our analysis we
All four curves are shown in Figure 4. Here AvA and PvP            sought to: (1) evaluate the differences between closely-related
designate the within tutor curves for Andes and Pyrenees           systems; (2) assess the impact of specific design decisions
respectively while PvA and AvP designate the cross-tutor           made in those systems for user behaviors; and (3) evalu-
curves for hints from Pyrenees provided to Andes users and         ate the potential application of data-driven hint generation
vice-versa. Figure 4 represents an average over all problems,      across systems.
and therefore the x-axis extends only as far as the minimum
number of students to complete a problem. As the curves            We found that, while the systems shared isomorphic domain
illustrate, the within-tutor curves reach high rates of cov-       models, problems, and ideal solutions, the observed user be-
erage relatively quickly with PvP reaching a plateau above         haviors differed substantially. Students using the Andes sys-
95% after 21 students and AvA reaching 85%.                        tem explored the space more widely, were more prone to
                                                                   identify novel solutions, and rarely followed the ideal solu-
The cross-tutor curves, by contrast, reach much lower lim-         tion path. Students in Pyrenees, by contrast, were far more
its. AvP reaches a plateau of over 75% coverage, while PvA         homogeneous in their solution process and were limited in
reaches a plateau of 60% coverage. This reflects the same          the alternative routes they explored. Contrary to our expec-
trends observed in Tables 1 and 2, where Andes better ex-          tations, we found that this variation was not due to simple
plains the Pyrenees data than vice-versa; however, neither         ordering variations in the simplest of steps but of alterna-
dataset completely covers the other. On the one hand this          tive strategy selection for the higher-level domain principles.
result is somewhat problematic as it indicates that prior data     This is largely consistent with the design decisions that mo-
has a limited threshold for novel tutors or novel versions of a    tivated both systems and with the results of prior studies.
We also found that the state-based hint generation method            Building Expert Systems. Addison-Wesley Publishing
used in the Hint Factory can be applied to the Andes and             Company Inc., Reading, Massachusetts, U.S.A., 1983.
Pyrenees data given a suitable state representation. For this   [10] A. Hicks, B. Peddycord III, and T. Barnes. Building
analysis we opted for a set-based representation given the           Games to Learn from Their Players: Generating Hints
absence of strong ordering constraints across the principles.        in a Serious Game. In Intelligent Tutoring Systems
We then completed a cold-start analysis to show that the             (ITS), pages 312–317, 2014.
cross-tutor data could be used to bootstrap the construction    [11] S. Katz, A. Lesgold, E. Hughes, D. Peters, G. Eggan,
of hints for a novel system but does not provide for complete        M. Gordin, and L. Greenberg. Sherlock 2: An
coverage.                                                            intelligent tutoring system built on the lrdc
                                                                     framework. In C. P. Bloom and R. B. Loftin, editors,
Ultimately we believe that the techniques used for data-             Facilitating the Development and Use of Interactive
driven hint generation have direct application to model-             Learning Environments, Computers, Cognition, and
based systems. Data-driven analysis can be used to iden-             Work, chapter 10, pages 227 – 258. Lawrence Erlbaum
tify the behavioral differences between closely related sys-         Associates, Mawah New Jersey, 1998.
tems and, we would argue, changes from one version of a         [12] R. Kumar, M. E. Roy, B. Roberts, and J. I. Makhoul.
system to another. We also found that these changes can              Toward automatically building tutor models using
be connected to the specific design decisions made during            multiple behavior demonstrations. In
development. Further, we found that data-driven methods              S. Trausan-Matu, K. E. Boyer, M. Crosby, and
can be applied to model-based tutoring data to generate              K. Panourgia, editors, Proceedings of the 12th
state-based hints. We believe that hint information of this          International Conference on Intelligent Tutoring
type may be used to supplement the existing domain models            Systems, LNCS 8474, pages 535 – 544. Springer
to produce more user-adaptive systems. In future work we             Verlag, 2014.
plan to apply these analyses to other appropriate datasets      [13] B. Peddycord III, A. Hicks, and T. Barnes.
and to test the incorporation of state-driven hints or hint          Generating Hints for Programming Problems Using
refinement to existing domain models.                                Intermediate Output. In Proceedings of the 7th
                                                                     International Conference on Educational Data Mining
6.   ACKNOWLEDGMENTS                                                 (EDM 2014), pages 92–98, 2014.
Work supported by NSF Grant #1432156 “Educational Data          [14] C. Piech, M. Sahami, J. Huang, and L. Guibas.
Mining for Individualized Instruction in STEM Learning En-           Autonomously Generating Hints by Inferring Problem
vironments” Min Chi & Tiffany Barnes Co-PIs.                         Solving Policies. In Learning at Scale (LAS), 2015.
                                                                [15] K. Rivers and K. Koedinger. Automatic generation of
7.   REFERENCES                                                      programming feedback: A data-driven approach. In
 [1] T. Barnes and J. Stamper. Toward Automatic Hint                 The First Workshop on AI-supported Education for
     Generation for Logic Proof Tutoring Using Historical            Computer Science (AIEDCS 2013), 2013.
     Student Data. In Intelligent Tutoring Systems (ITS),
                                                                [16] K. Rivers and K. Koedinger. Automating Hint
     pages 373–382, 2008.                                            Generation with Solution Space Path Construction. In
 [2] T. Barnes and J. C. Stamper. Automatic hint                     Intelligent Tutoring Systems (ITS), 2014.
     generation for logic proof tutoring using historical       [17] J. C. Stamper, M. Eagle, T. Barnes, and M. J. Croy.
     data. Educational Technology & Society, 13(1):3–12,             Experimental evaluation of automatic hint generation
     2010.
                                                                     for a logic tutor. I. J. Artificial Intelligence in
 [3] M. Chi and K. VanLehn. Eliminating the gap between              Education, 22(1-2):3–17, 2013.
     the high and low students through meta-cognitive
                                                                [18] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
     strategy instruction. In Intelligent Tutoring Systems           R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
     (ITS), volume 5091, pages 603–613, 2008.                        M. Wintersgill. The andes physics tutoring system:
 [4] M. Chi and K. VanLehn. Meta-Cognitive Strategy                  Five years of evaluations. In C. Looi, G. I. McCalla,
     Instruction in Intelligent Tutoring Systems: How,               B. Bredeweg, and J. Breuker, editors, Artificial
     When, and Why. Educational Technology & Society,                Intelligence in Education - Supporting Learning
     3(1):25–39, 2010.                                               through Intelligent and Socially Informed Technology,
 [5] A. Corbett. Cognitive computer tutors: Solving the              Proceedings of the 12th International Conference on
     two-sigma problem. In User Modeling 2001, pages                 Artificial Intelligence in Education, AIED 2005, July
     137–147, 2001.                                                  18-22, 2005, Amsterdam, The Netherlands, volume
 [6] M. Eagle and T. Barnes. Exploring Differences in                125 of Frontiers in Artificial Intelligence and
     Problem Solving with Data-Driven Approach Maps. In              Applications, pages 678–685. IOS Press, 2005.
     Educational Data Mining (EDM), 2014.                       [19] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
 [7] M. Eagle and T. Barnes. Exploring Networks of                   R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
     Problem-Solving Interactions. In Learning Analytics             M. Wintersgill. The andes physics tutoring system:
     (LAK), 2015.                                                    Lessons learned. I. J. Artificial Intelligence in
 [8] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn              Education, 15(3):147–204, 2005.
     from you, you learn from me: How to make iList learn       [20] K. VanLehn and B. van de Sande. The Andes physics
     from students. In Artificial Intelligence in Education          tutoring system: An experiment in freedom. Advances
     (AIED), 2009.                                                   in intelligent tutoring systems., pages 421–443, 2010.
 [9] F. Hayes-Roth, D. A. Waterman, and D. B. Lenat.
       An Exploration of Data-Driven Hint Generation in an
              Open-Ended Programming Problem

                        Thomas W. Price                                             Tiffany Barnes
                  North Carolina State University                           North Carolina State University
                         890 Oval Drive                                            890 Oval Drive
                        Raleigh, NC 27606                                         Raleigh, NC 27606
                       twprice@ncsu.edu                                         tmbarnes@ncsu.edu

ABSTRACT                                                          lems have a large, often infinite, space of possible states.
Data-driven systems can provide automated feedback in the         Even a relatively simple programming problem may have
form of hints to students working in problem solving environ-     many unique goal states, each with multiple possible solu-
ments. Programming problems present a unique challenge            tion paths, leaving little overlap among student solutions.
to these systems, in part because of the many ways in which       Despite this challenge, a number of attempts have been
a single program can be written. This paper reviews current       made to adapt the Hint Factory to programming problems [9,
strategies for generating data-driven hints for programming       12, 16]. While these approaches have been generally success-
problems and examines their applicability to larger, more         ful, they are most effective on small, well structured pro-
open-ended problems, with multiple, loosely ordered goals.        gramming problems, where the state space cannot grow too
We use this analysis to suggest directions for future work to     large. This paper explores the opposite type of problem: one
generate hints for these problems.                                that has a large state space, multiple loosely ordered goals,
                                                                  unstructured output and involves creative design. Each of
1.   INTRODUCTION                                                 these attributes poses a challenge to current data-driven hint
Adaptive feedback is one of the hallmarks of an Intelli-          generation techniques, but they are also the attributes that
gent Tutoring System. This feedback often takes the form          make such problems interesting, useful and realistic. In this
of hints, pointing a student to the next step in solving a        paper, we will refer to these as open-ended programming
problem. While hints can be authored by experts or gener-         problems. We investigate the applicability of current tech-
ated by a solver, more recent data-driven approaches have         niques to these problems and suggest areas for future re-
shown that this feedback can be automatically generated           search.
from previous students’ solutions to a problem. The Hint
Factory [18] has successfully generated data-driven hints in a    The primary contributions of this paper are 1) a review of
number of problem solving domains, including logic proofs [2],    current data-driven hint generation methods for program-
linked list problems [5] and a programming game [12]. The         ming problems, 2) an analysis of those methods’ applica-
Hint Factory operates on a representation of a problem called     bility to an open-ended problem and 3) a discussion of the
an interaction network [4], a directed graph where each ver-      challenges that need to be addressed before we can expect
tex represents a student’s state at some point in the problem     to generate hints for similar problems.
solving process, and each edge represents a student’s action
that alters that state. A solution is represented as a path       2.     CURRENT APPROACHES
from the initial state to a goal state. A student request-        Current approaches to generating data-driven programming
ing a hint is matched to a previously observed state and          hints, including alternatives to the Hint Factory [10, 13, 17],
directed on a path to a goal state. The Hint Factory takes        can be broken down into three primary components:
advantage of the intuition that students with the same ini-
tial state and objective will follow similar paths, producing a
well-connected interaction network. When this occurs, even             1. A representation of a student’s state and a method
a relatively small sample of student solutions can be enough              for determining when one state can be reached from
to provide hints to the majority of new students [1].                     another, meaning they are connected in the network
                                                                       2. An algorithm that, given a student’s current state, con-
While problems in many domains result in well-connected                   structs an optimal path to a goal state. Often we sim-
networks, this is not always the case. Programming prob-                  plify this to the problem of picking the first state on
                                                                          that path
                                                                       3. A method to present this path, or next state, to the
                                                                          student in the form of a hint


                                                                  For the purposes of this paper, we limit our discussion to
                                                                  the first step in this process. (For a good discussion of the
                                                                  second step, see an analysis by Piech et al. [13], compar-
                                                                  ing path selection algorithms.) While in some domains this
first step is straightforward, in programming tasks, espe-         inition. Hicks and Peddycord [7, 12] used the Hint Factory
cially open-ended problems, it is likely the most challeng-        to generate hints for a programming game called Bots, in
ing. The simplest approach is to take periodic snapshots of        which the player writes a program to direct a robot through
a student’s code and treat these as states, connecting con-        various tasks in a 3D level. They chose to represent the state
secutive snapshots in the network. However, because two            of a player’s program as the final state of the game world
students’ programs are unlikely to match exactly, this ap-         after the program was executed. They compared the avail-
proach is likely to produce a very sparse, poorly connected        ability of hints when using this “world state” model with a
network, making it difficult to match new students to prior        traditional “code state” model, and found that using world
solution attempts. A variety of techniques have been pre-          states significantly reduced the total number of states and
sented to address this problem, which can be grouped into          increased the availability of hints. The challenge with this
three main strategies: canonicalization, connecting states         approach, as noted by the authors, is the generation of ac-
and alternative state definitions.                                 tionable hints. A student may be more capable of making a
                                                                   specific change to her code than determining how to effect
2.1    Canonicalization                                            a specific change in the code’s output.
Canonicalization is the process of putting code states into
a standardized form, often by removing semantically unim-          3.   AN OPEN-ENDED PROBLEM
portant information, so that trivial difference do not prevent     The above techniques have all shown success on smaller,
two states from matching. Rivers and Koedinger [15] present        well-structured problems, with ample data. We want to in-
a method for canonicalizing student code by first represent-       vestigate their applicability to an open-ended problem, as
ing it as an Abstract Syntax Tree (AST). Once in this form,        described in Section 1, where this is not the case. The pur-
they apply a number of functions to canonicalize the code,         pose of this paper is not to create actionable hints, nor are
including normalizing arithmetic and boolean operators, re-        we attempting to show the failures of current methods by
moving unreachable and unused code, and inlining helper            applying them to an overly challenging task. Rather, our
functions. After performing this canonicalization on a set         purpose is exploratory, using a small dataset to identify ar-
of introductory programming problems, they found that a            eas of possible future work, and challenges of which to be
median 70% of states had at least one match in the net-            mindful when moving forward with hint generation research.
work. Jin et al. [9] represent a program’s state as a Linkage
Graph, where each vertex is a code statement, and each di-         We collected data from a programming activity completed
rected edge represents an ordering dependency, determined          by 6th grade students in a STEM outreach program called
by which variables are read and assigned to in each state-         SPARCS [3]. The program, which meets for half-day ses-
ment. This state representation allows the Hint Factory to         sions approximately once a month during the school year,
ignore statement orderings which are not important to the          consists of lessons designed and taught by undergraduate
execution of the program. Lazar and Bratko [10] use the            and graduate students to promote technical literacy. The
actual text of Prolog code to represent a student’s state,         class consisted of 17 students, 12 male and 5 female.
and then canonicalize the code by removing whitespace and
normalizing variable names.                                        The activity was a programming exercise based on an Hour
                                                                   of Code activity from the Beauty and Joy of Computing cur-
2.2    Connecting States                                           riculum [6]. It was a tutorial designed to introduce novices
Even with canonicalization, a student requesting a hint may        to programming for the first time. The exercise had users
not match any existing state in the network. In this case, we      create a simple web-based game, similar to whack-a-mole,
can look for a similar state and create a connection between       in which players attempt to click on a sprite as it jumps
them. Rivers and Koedinger [16] use normalized string edit         around the screen to win points. The exercise was split into
distance as a similarity metric between two program states.        9 objectives, with tutorial text at each stage. Students were
They connect any two states in the network which have at           not required to finish an objective before proceeding. A fin-
least 90% similarity, even if no historical data connect these     ished project required the use of various programming con-
states. Additionally, they use a technique called path con-        cepts, including events, loops, variables and conditionals.
struction to generate new solution paths from a given state        The students used a drag-and-drop, block-based program-
to a nearby, unconnected goal state by searching for a series      ming language called Tiled Grace [8], which is similar to
of insertions, deletions and edits to their AST that will trans-   Scratch [14]. The user writes a program by arranging code
form it into the goal state [17]. They also use this method        blocks, which correspond directly to constructs in the Grace
to discover new goal states which may be closer to the stu-        programming language. The editor also supports switching
dent’s current state. Jin et al. [9] use a similar technique       to textual coding, but this feature was disabled. A screen-
to transform their Linkage Graphs to better match the cur-         shot of the activity can be seen in Figure 1.
rent state of a student when no direct matches can be found
in the interaction network. Piech et al. [13] use path con-        During the activity, the students were allowed to go through
struction to interpolate between two consecutive states on         the exercise at their own pace. If they had questions, the
a solution path which differ by more than one edit. This           students were allowed to ask for help from the student vol-
is useful to smooth data when student code is recorded in          unteers. Students were stopped after 45 minutes of work.
shapshots that are too far apart.                                  Snapshots of a student’s code were saved each time it was
                                                                   run and periodically throughout the session. Occasional
2.3    Alternate State Definitions                                 technical issues did occur in both groups. One student had
Another approach is to forego the traditional code-based           severe technical issues, and this student’s data was not an-
representation of a student’s state, and use an alternate def-     alyzed (and is not reflected in the counts above). Students
                                                                                            Raw     Canonical    Ordered
                                                                 Total States               2380      1781         1656
                                                                 % Unique                  97.5%     94.8%        92.8%
                                                                 Mean NU Count              3.44      3.95         2.82
                                                                 Median NU Count              2         2            2
                                                                 Mean % Path Unique        89.9%     83.0%        78.9%
                                                                 Standard Deviation        (6.67)    (10.5)       (13.3)

                                                                Table 1: Various measures of the sparseness of the
                                                                interaction network for the raw, canonicalized, and
                                                                ordered-canonicalized state representations. Mean
                                                                and median NU counts refer to the number of stu-
                                                                dents who reached each non-unique state.


Figure 1: A screenshot of the Hour of Code activ-               ent code states to be merged in the process. We therefore
ity that students completed. Students received in-              see this as an upper bound on the value of removing unim-
structions on the left panel. The center panel was a            portant orderings from a code state. We recomputed our
work area, and students could drag blocks in from               metrics for the ordered-canonicalized interaction network as
the bottom-right panel. The top-right panel allowed             well. The results can be seen in Table 1. Our later analyses
students to test their games.                                   use the unsorted, canonicalized code representation.

                                                                These results indicate that canonicalization does little to re-
produced on average 148.5 unique code states and accom-         duce the sparsity of the state space, with students spending
plished between 1 and 6 of the activity’s objectives, averag-   most of their time in states that no other student has seen.
ing 3.2 objectives per student.                                 For comparison, recall Rivers and Koedinger found 70% of
                                                                states in a simple programming problem had a match after
4.    ANALYSIS                                                  canonicalization [15], though they were using a much larger
We attempted to understand the applicability of each of the     dataset. In our dataset, it is unlikely that we would be able
techniques discussed in Section 2 to our dataset. However,      to find a direct path from a new student’s state to a goal
since the output of our program was a game that involved        state in order to suggest a hint.
nondeterminism, we felt it would be inappropriate to at-
tempt to represent a program’s state as the result of its       4.2    Connecting States
execution. We therefore focused on the first two strategies,
                                                                To address this, we explored the feasibility of connecting a
canonicalization and connecting states.
                                                                new student’s state to a similar, existing state in the net-
                                                                work. It is unclear how close two code states should be
4.1   Canonicalization                                          before it is appropriate to connect them as in [16], or to
Our initial representation of a student’s code state was a      generate a path between them as in [17]. It certainly de-
tree, where each code block was a node, and its children in-    pends on the state representation and distance metric used.
cluded any blocks that were nested inside of it. In this way,   Rather than identifying a cutoff and measuring how often
our representation was similar to Rivers and Koedinger’s        these techniques could be applied, we chose to visualize the
ASTs. To get a baseline for the sparsity of our dataset,        distance between two students and make qualitative obser-
we first analyzed the code states without performing any        vations. Because our code states were already represented
canonicalization. We calculated the total number of states      as trees, we used Tree Edit Distance (TED) as a distance
in the interaction network and the percentage which were        metric. While Rivers and Koedinger reported better success
only reached by one student. Of those states reached by         with Levenshtein distance [16], we believe that TED is the
multiple students, we calculated the mean and median num-       most appropriate distance metric for block code, where tree
ber of students who reached them. We also calculated the        edit operations correspond directly to user actions.
percentage of each student’s states that were unreached by
any other student in the dataset.                               For each pair of students, A and B, we created an N by M
                                                                distance matrix, D, where N is the number of states in A’s
We then canonicalized the data by removing variable names       solution path, and M is the number of states in B’s solution
and the values of number and string literals. Our problem       path. Di,j = d(Ai , Bj ), where d is the TED distance func-
featured very few arithmetic or logical operators, and these    tion, Ai is the ith state of A and Bj is the jth state of B.
were generally not nested, so we did not normalize them,        We used the RTED algorithm [11] to calculate the distance
as suggested by Rivers and Koedinger [15]. We reran our         function, putting a weight of 1.0 on insertions, deletions and
analyses on the canonicalized data. To ensure that we had       replacements. We also omitted any state which was identical
effectively removed all unimportant ordering information,       to its predecessor state. We normalized these values by di-
we recursively sorted the children of each node in the tree.    viding by the maximum value of Di,j , and plotted the result
This effectively removed any ordering information from a        as an image. Three such images can be seen in Figure 2.
student’s code state, and kept only hierarchical information.
This is somewhat more extreme than the Linkage Graphs of        We also calculated the “path” through this matrix that passes
Jin et al. [9], and it does allow two meaningfully differ-      through the least total distance. This does not represent a
                                                                          Mean         Median          Max           Farthest
                                                                  1    0.25 (0.27)   0.15 (0.29)    0.76 (0.56)     2.23 (0.75)
                                                                  2    4.88 (3.93)   4.95 (4.34)    9.18 (5.74)    12.73 (6.10)
                                                                  4    4.92 (2.77)   4.83 (2.78)   10.11 (3.69)    14.67 (4.77)
                                                                  5    7.79 (1.32)   7.75 (1.41)   13.17 (1.72)    18.17 (1.72)
                                                                  6    7.49 (1.11)   7.76 (1.37)   13.17 (0.98)    18.67 (1.75)

                                                                 Table 2: For each objective, average distances (and
                                                                 standard deviations) of minimum-distance student
                                                                 pairs, using the mean, median and max metrics.
                                                                 For reference, the final column represents the av-
                                                                 erage maximum distance each student moved from
                                                                 the start state while completing the objective.


                                                                 A visual inspection of these matrices reveals that while many
                                                                 student pairs are quite divergent, some show a notable close-
                                                                 ness throughout the exercise. In order to quantify these
                                                                 results, we developed a set of distance metrics between stu-
                                                                 dents. First, the distance matrix and the minimum-distance
                                                                 “path” were calculated for the two students. The path is
                                                                 comprised of pairs of states, and for each pair, we recorded
                                                                 the tree edit distance between the states. From this list of
                                                                 distances, we calculated the mean, median and maximum
                                                                 distances between the two students. We looked at each ob-
Figure 2: Three distance matrices, each comparing                jective in the exercise, and isolated the relevant subpath of
two students, where each pixel represents the TED                each student who completed that objective. We paired each
between two states. Lighter shades of gray indi-                 of these subpaths with the most similar subpath in the set,
cate smaller distances. The green/yellow line shows              using the mean, median and max distance metrics. Table 2
the path through the matrix with minimized total                 shows the average values of these minimized pairs of stu-
distance, with yellow shades indicating smaller dis-             dents, using each metric. Objective 3, and objectives 7-9
tances. Red crosses indicate where both students                 were omitted, as too few students completed them.
met an objective. The top-left figure compares two
students as they completed objectives 1-4 in the ex-
ercise. In this example, the minimum-distance line               5.   DISCUSSION
crosses each objective. The top right figure also de-            We have attempted to apply a meaningful canonicalization
picts two students completing objective 4, but the               to our state space, which did serve to reduce the number
darker colors and straighter line indicate less align-           of states by 30.4%. However, as seen in Table 1, even af-
ment. The bottom figure depicts two students com-                ter the strongest canonicalization, over 90% of the states in
pleting objectives 1-6, with high alignment.                     the interaction network had only been reached by one stu-
                                                                 dent, with an average 78.9% of the states in each student’s
                                                                 solution path being unique to that student. It seems that
path through the interaction network, but rather an align-       our approach to canonicalization is insufficient to produce a
ment between the states of student A and those of student        meaningful reduction of the state space, though it is possible
B. Each pixel of the line represents a pairing of a state        a more stringent canonicalization would be more effective.
from A with a state from B, such that these pairings are
contiguous and represent the smallest total distance. While      Connecting existing states seems to be a more promising ap-
we do not suggest applying this directly as a strategy for       proach, giving us the ability to link new states to previously
hint generation, it serves an a useful visual indicator of the   observed states, even when they do not match exactly. Our
compatibility of two students for hinting purposes.              distance matrices indicate that some students take parallel,
                                                                 or slowly diverging solution paths, which suggests that they
An alternate approach would have been to pair each state in      may be useful to each other in the context of hinting. As
the interaction network with its closest pair from any other     shown in Figure 2, students are often closest together when
student, and use this as a measure of how sparse the network     completing the same objective. This may seem self-evident,
was. We chose to compare whole students, rather than indi-       but it does indicate that our distance metric is meaning-
vidual states, because we felt that the former could lead to     ful. It is more difficult to put the actual TED values into
strange hinting behavior. Imagine, for instance, that a stu-     context. Students get, on average, farther away from their
dent requests a hint, which initially points to a state from     closest paired student as they complete more objectives, but
student B, but at the very next step requests a hint that        this average distance does not exceed 8 tree edits during the
points instead to student C. Perhaps the attributes that         first 6 objectives. To put that number into context, during
make the student’s state similar to that of B are different      this same time students do not, on average, get more than
from those that make the state similar to C. The resulting       19 tree edits away from the start state. This suggest that
hints would be at best confusing, and at worst conflicting.      there is certainly hint-relevant knowledge in these paired stu-
dents, but that it may be difficult to harness this knowledge     [3] V. Cateté, K. Wassell, and T. Barnes. Use and
to generate a hint.                                                   development of entertainment technologies in after
                                                                      school STEM program. In Proceedings of the 45th
                                                                      ACM technical symposium on Computer science
5.1   Limitations                                                     education, pages 163–168, 2014.
It is important to note that this analysis is an exploratory      [4] M. Eagle, M. Johnson, and T. Barnes. Interaction
case study, and makes no strong claims, only observations.            Networks: Generating High Level Hints Based on
We studied data from only 17 novice programmers, and the              Network Community Clustering. In International
problem we analyzed was highly complex, involving multiple            Educational Data Mining Society, pages 164–167,
control structures, loosely ordered objectives, and unstruc-          2012.
tured output. This makes the problem quite dissimilar from        [5] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn
previous problems that have been been studied in the con-             from you, you learn from me: How to make iList learn
text of hint generation, making it difficult to determine what        from students. In Artificial Intelligence in Education
observations should be generalized.                                   (AIED), 2009.
                                                                  [6] D. Garcia, B. Harvey, L. Segars, and C. How. AP CS
5.2   Future Work                                                     Principles Pilot at University of California, Berkeley.
Rivers and Koedinger [16], as well as Jin et al. [9] note             ACM Inroads, 3(2), 2012.
the limitations of their methods for larger problems, and         [7] A. Hicks, B. Peddycord III, and T. Barnes. Building
each suggest that breaking a problem down into subprob-               Games to Learn from Their Players: Generating Hints
lems would help to address this. Lazar and Bratko [10] at-            in a Serious Game. In Intelligent Tutoring Systems
tempted this in their Prolog tutor by constructing hints for          (ITS), pages 312–317, 2014.
individual lines of code, which were treated as independent       [8] M. Homer and J. Noble. Combining Tiled and Textual
subproblems. Similarly, we may be able to isolate the sub-            Views of Code. In Proceedings of 2nd IEEE Working
section of a student’s code that is currently relevant, and           Conference on Software Visualization, 2014.
treat this as an independent problem. The hierarchical na-        [9] W. Jin, T. Barnes, and J. Stamper. Program
ture of block-based coding environments lends itself to this          representation for automatic hint generation for a
practice, making it an appealing direction for future work.           data-driven novice programming tutor. In Intelligent
                                                                      Tutoring Systems (ITS), 2012.
Our work makes the simplifying assumption that unweighted        [10] T. Lazar and I. Bratko. Data-Driven Program
TED is a reliable distance metric for code, but future work           Synthesis for Hint Generation in Programming Tutors.
should investigate alternative metrics. This might include a          In Intelligent Tutoring Systems (ITS). Springer, 2014.
weighted TED metric, which assigns different costs to inser-     [11] M. Pawlik and N. Augsten. RTED: a robust algorithm
tions, deletions and replacements, or even to different types         for the tree edit distance. Proceedings of the VLDB
of nodes (e.g. deleting a for-loop node might cost more than          Endowment, 5(4):334–345, 2011.
a function call node). Regardless of the metric used, once       [12] B. Peddycord III, A. Hicks, and T. Barnes.
two proximate states are identified, it is still an open ques-        Generating Hints for Programming Problems Using
tion how this information can be best used for hint gener-            Intermediate Output. In Proceedings of the 7th
ation. It is possible to construct a path between the states          International Conference on Educational Data Mining
and direct a student along this path. However, future work            (EDM 2014), pages 92–98, 2014.
might also investigate how to extract hint-relevant informa-     [13] C. Piech, M. Sahami, J. Huang, and L. Guibas.
tion from one state and apply it to a similar state directly.         Autonomously Generating Hints by Inferring Problem
                                                                      Solving Policies. In Learning at Scale (LAS), 2015.
Because of the nature of our problem’s output, we did not
                                                                 [14] M. Resnick, J. Maloney, H. Andrés, N. Rusk,
explore non-code-based state representations, as described
                                                                      E. Eastmond, K. Brennan, A. Millner, E. Rosenbaum,
in Section 2.3. It would still be worth investigating how this        J. Silver, B. Silverman, and Y. Kafai. Scratch:
might be applied to open-ended problems. For instance, a              programming for all. Communications of the ACM,
code state could be represented as a boolean vector, indi-            52(11):60–67, 2009.
cating whether the code has passed a series of Unit Tests,
                                                                 [15] K. Rivers and K. Koedinger. A canonicalizing model
and hints could direct the student to the current flaw in
                                                                      for building programming tutors. In Intelligent
their program. However, creating actionable hints from this
                                                                      Tutoring Systems (ITS), 2012.
information would pose a significant challenge.
                                                                 [16] K. Rivers and K. Koedinger. Automatic generation of
                                                                      programming feedback: A data-driven approach. In
6.    REFERENCES                                                      The First Workshop on AI-supported Education for
 [1] T. Barnes and J. Stamper. Toward Automatic Hint                  Computer Science (AIEDCS 2013), 2013.
     Generation for Logic Proof Tutoring Using Historical        [17] K. Rivers and K. Koedinger. Automating Hint
     Student Data. In Intelligent Tutoring Systems (ITS),             Generation with Solution Space Path Construction. In
     pages 373–382, 2008.                                             Intelligent Tutoring Systems (ITS), pages 329–339,
 [2] T. Barnes, J. Stamper, L. Lehman, and M. Croy. A                 2014.
     pilot study on logic proof tutoring using hints             [18] J. Stamper, M. Eagle, T. Barnes, and M. Croy.
     generated from historical student data. In Proceedings           Experimental evaluation of automatic hint generation
     of the 1st Annual International Conference on                    for a logic tutor. Artificial Intelligence in Education
     Educational Data Mining (EDM), pages 1–5, 2008.                  (AIED), 22(1):3–17, 2013.
     Studio: Ontology-Based Educational Self-Assessment
                                    Christian Weber                             Réka Vas
                                   Corvinno Technology                     Corvinus University
                                     Transfer Center                          of Budapest
                                    Budapest, Hungary                      Budapest, Hungary
                               cweber@corvinno.com                    reka.vas@uni-corvinus.hu


ABSTRACT                                                              cognitive behavior. In his discussion, eight out of nine knowledge
Students, through all stages of education, grasp new knowledge        types underline that knowledge in the scope of learning is
in the context of knowledge memorized all through their previous      interrelated and strongly associated with previous experiences
education. To self-predict personal proficiency in education, self-   [2]. As such, a supporting solution for self-assessment should
assessment acts as an important learning feedback. The in-house       grasp and formalize the knowledge to assess in the context of
developed Studio suit for educational self-assessment enables to      related knowledge.
model the educational domain as an ontology-based knowledge           The Studio suit for educational self-assessment, presented in this
structure, connecting assessment questions and learning material      paper, provides here a software solution for testing the personal
to each element in the ontology. Self-assessment tests are then       proficiency in the context of related knowledge. It enables to
created by utilizing a sub-ontology, which frames a tailored          model areas of education as a substantial source for assessment
testing environment fitting to the targeted educational field. In     and narrows the gap between a potentially flawed self-prediction
this paper we give an overview of how the educational data is         and the real proficiency, by offering an objective and adaptive
modeled as a domain ontology and present the concepts of              online knowledge-test. To follow the natural learning process
different relations used in the Studio system. We will deduct how     and enable an easy extension, the software embeds the assessed
the presented self-assessment makes use of the knowledge              knowledge into a network of contextual knowledge, which
structure for online testing and how it adapts the test to the        enables to adapt the assessment to the responses of the students.
performance of the student. Further we highlight where
potentials are for the next stages of development.                    This paper will give an overview of the Studio educational
                                                                      domain ontology and the aspects of the system supporting
Keywords                                                              personalized self-assessment. Further it will highlight potentials
                                                                      for data mining on the gathered educational data with an outlook
Education, adaptive test, self-assessment, educational ontology       on the next stages of evaluation.

1. INTRODUCTION                                                       2. THE STUDIO APPROACH FOR SELF-
Students exploring new fields of education are always confronted      ASSESSMENT
with questions regarding their individual progress: how much do       The basic concept of Studio is to model the focused education as
they know after iterations of learning, in which directions should    an interrelated knowledge structure, which divides the education
they progress to fill the field most effectively, how to grasp the    into sub-areas and knowledge items to know. The managed
outline and details of the field and how much of their time           structure formalizes the relation between knowledge areas as a
should they invest in learning? Especially in higher education,       learning context and models the requirements to master specific
where learning becomes a self-moderated, personalized process,        parts of the education. This structure is used to create and
students are in need of continuous self-assessment to capture         support knowledge tests for students. Through this combination
their current state of proficiency. At the same time, the             of assessment and knowledge structure, the student gains the
unframed, informal self-prediction of students regarding their        freedom to explore not only single knowledge items but the
personal skills is often substantive and systematically flawed [1].   education in the context of related knowledge areas, while the
Here a systematic and objective solution for self-assessment is       embedded requirements are used to map the modeled knowledge
substantial to prevent a wrong or biased self-evaluation and to       against the expected educational outcome.
support the self-prediction of the personal proficiency.
                                                                      The assessment-system is designed to be accompanied by phases
Following Jonassen, knowledge in education could be split into        of learning within the system, where the student gets access to
nine types across three categories to capture the human’s             learning material, based on and supported by the test feedback.
                                                                      This combined approach offers a unique self-assessment to the
                                                                      students, where the backing knowledge context is used to adapt
                                                                      the assessment in dependency of the test performance of the
                                                                      student.
                                                                      Before any regular examination students may use Studio to
                                                                      assess their knowledge on their own. It is the tutor’s
                                                                      responsibility to set the course of self-assessment test in Studio
system by selecting knowledge areas and sub-knowledge areas            aspects of education as the curriculum or aspects relevant for the
which are relevant for the target education from the domain            task of learning and course creation [8][9][10] or describe the
ontology. Then the frame will be automatically completed with          design, use and retrieval of learning materials till creating
elements from the ontology which detail the selected knowledge         courses [11], as well as directly the learner within the education
areas and are modeled as required for this part of the education.      [12].
As the system stores assessment questions for each knowledge           Within the area of educational ontologies, domain ontologies
element, Studio will then automatically prepare an assessment          tend to model too specific details of the education, in an attempt
test, based on the defined selection and the domain ontology. The      to model the specific field as complete as possible. This enables
resulting knowledge-test is then accessible as a self-assessment       a comprehensive view on the field but it comes at the cost of
test for the student, who explores the backed knowledge                generality, with the potential to be inflexible to handle changes.
structure, which pictures the expected learning outcome, in            Other concepts model the education across different ontologies,
cycles of testing, reflection and learning. The process of test        matching concepts like the learner, the education and the course
definition and assessment is shown in Figure 1, while the result       description, introducing a broad horizon but with additional
preparation for reflection and learning is discussed in section 2.5.   overhead to combine modelled insights and reason on new
                                                                       instances.
                                                                       The appeal of the Studio educational ontology is the size and
                                                                       focus of the main classes and their relationships between each
                                                                       other. The knowledge to learn is the main connecting concept in
                                                                       the core of education. It enables a great flexibility to be
                                                                       resourceful for different education related questions. An example
                                                                       is here the business process management extension PROKEX,
                                                                       which maps process requirements against knowledge areas to
                                                                       create assessment test, reflecting the requirements of attached
                                                                       processes [13].
                                                                       An important factor in learning is the distance between the
                                                                       expectation of the tutor and the learning performance of the
                                                                       student. Here a short cycle of repeated assessment and learning
                                                                       is a major factor for a better personal learning performance [14].
                                                                       This aspect directly benefits from the focused concentration on
                                                                       knowledge-areas as the main exchange concept between students
                                                                       and tutors. As even further the close connections between
                                                                       learners and educators via direct tutoring is one major enabler for
                                                                       computer aided systems [15], each step towards a more direct
                                                                       interaction through focused concepts is an additional supporter.
                                                                       The class structure fuses the idea of interrelated knowledge with
                                                                       a model of the basic types of educational concepts, involved in
                                                                       situations of individual learning. Figure 2 visualizes the class
                                                                       concepts as knowledge elements, together with the relation types,
                                                                       used to model the dependencies between different aspects of
                                                                       knowledge and learning within the educational ontology.
                                                                       The Knowledge Area is the super-class and core-concept of the
                                                                       ontology. The ontology defines two qualities of main relations
                                                                       between knowledge areas: Knowledge areas could be a sub-
                                                                       knowledge area of other knowledge areas with the “has_sub-
Figure 1: The overall design, assess and reflection cyle of the        knowledge_area” relation or be required for another knowledge
                           system.                                     area with the “requires_knowledge_of” relation. A knowledge
                                                                       area may have multiple connected knowledge areas, linked as a
2.1 The Educational Domain Ontology                                    requirement or sub-area. The “requires_knowledge_of” relation
The Studio system is based on a predesigned educational
                                                                       defines that a node is required to complete the knowledge of a
ontology, explained in detail by Vas in [3]. Domain ontology is a      parent knowledge area. This strict concept models a requirement
frequently used term in the field of semantic technologies and
                                                                       dependency between fields of knowledge in education and yields
underlines the storage and conceptualization of domain
                                                                       the potential to assess perquisites of learning, analog to the basic
knowledge and is often used in a number of projects and
                                                                       idea of perquisites within knowledge spaces, developed by
solutions [4][5][6] and could address a variety of domains with        Falmagne [16].
different characteristics in their creation, structure and
granularity, depending on the aim and the modeling person [7]. A       Education is a structured process which splits the knowledge to
specialization in terms of the field is the educational domain         learn into different sub-aspects of learning. Knowledge areas in
ontology which is a domain ontology adapted to the area and            the ontology are extended by an additional sub-layer of
concepts of education. They could target to model different            knowledge elements in order to effectively support educational
and testing requirements. Figure 2 visualizes the sub-elements                 2.3 Creating and Maintaining Tests
and their relations. By splitting the assessed knowledge into sub-             The creation and continues maintenance of the domain ontology
concepts, the coherence and correlation of self-assessment                     is a task of ontology engineering. The ontology engineer (the
questions could be expressed more efficiently and with the                     ontologist), creates, uses and evaluates the ontology [17], with a
potential of a more detailed educational feedback.                             strong focus on maintaining the structure and content. Within
                                                                               Studio, this process is guided and supported by a specialized
              Has sub-                                            Requires     administration workflow and splits in three consecutive task
                                     Knowledge Area                            areas, in line with decreasing access rights:
           knowledge area                                       knowledge of

                                                                                        Ontology engineering (instance level): The creation
                                                                                        and linking of instances of the existing knowledge-area
                                Part of               Part of                           classes into the overall domain ontology.
                                                Part of
                                                                                        Test definition: Knowledge areas, which are relevant
                                                                                        to a target self-assessment test, are selected and
Refers                                                                                  grouped into specialized containers called Concept
                   Basic Concept                                Example
  to
                                          Refers to                                     Groups (CG). These concept groups are organized into
                                                                                        a tree of groups, in line with the target of the
                                                                                        assessment. The final tree in this regards captures a
                                                       Refers to
                                                                                        sub-ontology. Concept groups are internally organized
                                                                                        based on the overall ontology and include all relations
                      Pr
         Co




                        em
          nc




                                                                                        between knowledge elements, as defined within the
                          ise
            lu
               s




                                                                                        domain ontology.
              io
                 n




                                          Theorem               Refers
                                                                  to                    Question and learning material creation: Questions
                                                                                        and learning materials alike are directly connected to
                                                                                        single knowledge areas within the designed test frame
                                                                                        and get imported, if already existing, from the domain
                           Test                           Learning                      ontology. More questions and learning materials are
                         Questions                        Material                      defined now, in line with the additional need of the
                                          Testbank
                                                                                        targeted education and are available for future tests.
                                                                               The pre-developed structure of classes and relations is fixed as
          Figure 2: Model of the educational ontology.                         the central and integral design of the system. A view of the
Theorems express in a condensed and structured way the                         system interface for administration is provided in Figure 3. The
fundamental insights within knowledge areas. They fuse and                     left area shows the visualization of the current ontology section
explain the basic concepts of the depicted knowledge and set                   in revision and the right area shows the question overview with
them in relation to the environment of learning with examples.                 editing options. Tabs give access to additional editing views,
Multiple theorems could be “part_of” a knowledge area. Each                    including the learning material management and interfaces to
theorem may define multiple Basic Concepts as a “premise” or                   modify relations between nodes and node descriptions.
“conclusion”, to structure how the parts of the knowledge area
are related. Examples enhance this parts as a strong anchor for                2.4 Adaptive Self-Assessment
self-assessment questions and “refer_to” the theorems and basic                To prepare an online self-assessment test, the system has to load
concepts as a “part_of” one or more knowledge areas.                           the relevant educational areas from the domain ontology and
                                                                               extract the questions and relations of the filtered knowledge
2.2 The Testbank                                                               areas.
In order to connect the task of self-assessment with the model of              The internal test algorithm makes use of two assumptions:
the educational domain, the system integrates a repository of
assessment questions. Each question addresses one element of                            Knowledge-area ordering: As the main knowledge
the overall knowledge and is directly associated with one                               areas are connected through “requires_knowledge_of”
knowledge area or knowledge element instance within the                                 and “part_of” relations, every path, starting with the
ontology. The domain ontology provides here the structure for the                       start-element, will develop on average from general
online self-assessment while the repository of questions                                concepts to detailed concepts - given that the concept
supplements the areas as a test bank. The target of the self-                           groups in the test definition are also selected and
assessment is to continuously improve the personal knowledge                            ordered to lead from general to more detailed groups.
within the assessed educational areas, by providing feedback on                         Knowledge evaluation dependency: If a person,
the performance after each phase of testing. To do so, the Studio                       taking the test, fails on general concepts he or she will
system includes Learning Material connected to the test bank and                        potentially also fail on more detailed concepts. Further,
the knowledge areas, analog to the test questions. The learning                         if a high number of detailed concepts are failed, the
material is organized into sections as a structured text with                           parent knowledge isn’t sufficiently covered and will be
mixed media, as pictures and videos, and is based on a wiki-                            derived as failed, too.
engine to maintain the content, including external links.
          Figure 3: The main ontology maintenance and administration interface, showing a part of the domain ontology.
The filtering is done based on the selection of a tutor, acting as    same level. If the learner’s answer is correct, the system will
an expert for the target educational area. The tutor chooses          activate the child elements of the current node and draw a
related areas, which are then created as a Test Definition,           random question from the first left child.
containing Concept Groups, as described in section 2.3. The           Based on the tree shaped knowledge structure, the assessment
system then uses the test definition as a filtering list to extract   now follows these steps to run the self-assessment, supported by
knowledge areas. After the extraction, the structure is cached as     the extracted knowledge structure:
a directed graph, while the top element of the initial concept
group is set as a start element. Beginning with the start-element,        1.   Starting from the start-element, the test algorithm will
the test will move then through the graph, while administering                 activate the child knowledge-areas of the start element.
the questions connected to knowledge areas and knowledge                  2.   The algorithm now selects the first child-knowledge
elements.                                                                      area and draws a random question out of the pool of
The loading of knowledge-elements follows three steps:                         available questions for this specific knowledge-element
                                                                               from the test bank.
     1.   Each type of relation between two knowledge-elements
          implements a direction for the connection. Assuming             3.   If the learner fails the question, the algorithm will
          the system loads all relations, starting with the start-             mark the element as failed and select the next
          element and ending on a knowledge-element, this                      knowledge area from the same level. If the learner’s
          creates a two level structure where the start-node is a              answer is correct, the system will activate the child
          parent-element and all related, loaded elements are                  elements of the current node and trigger the process for
          child-elements, as seen below in Figure 4.                           each child-element.
     2.   The loading algorithm then selects one child-element
          and assumes it as a start-element and repeat the
          loading process of knowledge-elements.
     3.   When no knowledge-elements for a parent-element
          could be loaded, the sub-process stops. When all sub-
          processes have stopped, the knowledge structure is
          fully covered.
The test algorithm will now activate the child knowledge areas of
the start element and select the first knowledge area to the left
and draw a random question from the selected knowledge area. If        Figure 4: Excerpt from the sub-ontology visualization, with
the learner fails the question, the algorithm will mark the             the visible parent-child relationship, as used in the data-
element as failed and selects the next knowledge area from the                  loading for preparing the self-assessment.
An example question is shown below in Figure 5. Further               JavaScript framework [18]. The visualization itself is a custom
following the testing algorithm, the system dives down within the     build, similar to the Ext JS graph function “Radar” and based on
domain ontology and triggers questions depending on the               the idea of Ka-Ping, Fisher, Dhamija and Hearst [19]. All views
learner’s answers and the extracted model of the relevant             are able to zoom in and out of the graph, move the current
education. In this regards the Studio system adapts the test on the   excerpt and offer a color code legend, explaining the meaning of
fly to the performance of the learner. Correlating to the idea of     the colored nodes. In comparison with state of the art, the
adaptation, the learner will later gain access to learning material   interface offers no special grouping or additional visualization
for each mastered knowledge area. As the learner continues to         features like coding information into the size of nodes. Each
use the self assessment to evaluate the personal knowledge, he or     interface offers an additional textual tree view to explore the
she will thus explore different areas of the target education,        knowledge-elements or concept groups in a hierarchical listing.
following their individual pace of learning.                          This simple, straightforward approach for visualization correlates
                                                                      with the goal of a direct and easy to grasp feedback through
                                                                      interfaces which have a flat learning curve and enable to catch
                                                                      the functionality in a small amount of time.
                                                                      While this simple visualization is sufficient for the reasonable
                                                                      amount of knowledge-elments within the result view, this alone
                                                                      is not suitable for the domain ontology administration interface,
Figure 5: Test interface with a random drawn test question.           as seen in Figure 3. Here Studio realizes methodologies to filter
                                                                      and transform the data to visualize. To do so it makes use of two
2.5 Test Feedback and Result Visualization                            supporting mechanisms:
An important aspect of the system is the test feedback and                      The maximum-level-selector defines the maximum
evaluation interface. The educational feedback is one of the main               level the system extracts from the domain ontology for
enabler for the student to grasp the current state and extend of                full screen visualization.
the personal education. The domain ontology models the
structure and the dependencies of the educational domain, and                   In combination with the maximum level, the ontologist
the grouped test definition extracts the relevant knowledge for                 could select single elements within the domain
the target area or education. As such, the visualization of the                 ontology. This triggers an on-demand re-extraction of
ontology structure extracted for the test, together with the                    the visualized data, setting the selected knowledge-
indication of correct and incorrect answers, represents a map of                element as the centre element. The system then loads
the knowledge of the learner.                                                   the connected nodes, based on their relations into the
                                                                                orientation circles till the maximum defined level is
Throughout each view onto the ontology, the system uses the                     reached. More details about the transformation are in
same basic visualization, making use of the Sencha Ext JS                       [19].




                                Figure 6: Result visualization as educational feedback for the learner.
Together, this selection and transformation mechanism enables        Each event stores information about the system in 7
the fluent navigation within the complete domain ontology            dimensions, as described in Table 1 below:
structure, while re-using the same visualization interface.          Table 1: Event blueprint to store events concerning system
Figure 6 shows the main view of the result interface. The left                              interaction.
area shows the sub-ontology extracted for the test, while the
                                                                              Attribute                      Description
colored nodes represent the answers to the administered
questions. A red node visualizes wrong answers, while orange         Event description code       Which type of event and what
nodes are rejected nodes with correct answers but with an                                         factors are relevant.
insufficient number of correctly answered child nodes,               Location code                On which part of the assessment-
indicating a lack of the underlying knowledge. Green nodes                                        process or interface the event has
represent accepted nodes with correct answers and a sufficient                                    occurred.
amount of correctly answered questions for child nodes. Grey
nodes are not administered nodes, which were not yet reached         Session identifier           Each access of the system is one
by the learner, as higher order nodes had no adequate                                             session for one user.
acceptance.                                                          Numerical value storage      Multi-purpose     field,   filled
Even though the target of the system is not a strict evaluation in                                depending on the event type.
number, the evaluation of the percentage of solved and               String value storage         Multi-purpose     field,   filled
accepted knowledge elements helps the learner to track the                                        depending on the event type.
personal progress and could additionally be saved as a report
for further consultation. Besides providing an overview of the       Event-time                   The time of the start of the
                                                                                                  event.
self-assessment result, the result interface gives access to the
integrated learning material. For every passed node, the learner     Item reference               A unique reference code,
can now open the correlated material and intensify the                                            identifying the correlated item
knowledge for successful tested areas.                                                            within the ontology. E.g. a
Retaking the test in cycles of testing and learning, while                                        question or a knowledge-element
adapting the educational interaction, is the central concept of                                   ID.
the Studio approach for self-assessment. As a consequence the        All events are stored in order of their occurrence, so if no
system will not disclose the right answers to questions or           explicit end event is defined, the next event for the same
learning material for not yet administered knowledge areas, to       session and user is acting as the implicit end date. Extending
promote an individual reflection on the educational content          the existing storage of information within Studio, the new
outside of a flat memorization of content.                           logging system stores the additional events, as shown in Table
                                                                     2 below:
3. SYSTEM EVALUATION
                                                                             Table 2: Assessment events and descriptions.
The system has been used, extended and evaluated in a number
of European and nationally funded research projects, including              Event type                      Description
applications in business process management and innovation-          START_TEST                 Marks the start of a test.
transfer [20], medical education [21] and job market
competency matching [22].                                            END_TEST                   Marks the end of a test.
Currently the system is being evaluated based on a running           OPEN_WELCOME_LM            The user opened the welcome
study with 200 university students in the field of business                                     page.
informatics. The study will conclude on two current research         OPEN_LM_BLOCK              The student opened a learning
streams which are improving the systems testing and analysis                                    material block on the test
capability. The first direction looks into potentials for the                                   interface.
integration of learning styles into adaptive learning systems to
offer valuable advice and instructions to teachers and students      OPEN_LM                    The student opened the learning
[23]. Within the second direction the question is challenged on                                 material tab on the test interface.
how to adapt the presented self-assessment further towards the       RATE_LM                    The student rated the learning
performance of the students, based on extracting assessment                                     material.
paths from the knowledge structure [24].
                                                                     CHECK_RESULT               The student opened a result page.
For each running test, Studio collects basic quantitative data
about the number of assigned questions, how often tests are          CONTINUE_TEST              The student submitted an answer.
taken and how many students open which test and when. This           FINISH_TEST                The test has been finished.
is completed by qualitative measures, collecting which
                                                                     SUSPEND_TEST               The user suspended the test.
questions and knowledge elements the students passed or
failed. To conclude further on the mechanisms and impacts of         RESUME_TEST                The user has restarted a previously
Studio within the current study, a new logging system was                                       suspended test.
developed, collecting the interaction with the system and            SELECT_TEST_ALGO-          The algorithm used to actually test
detailed information about the feedback as detailed events.          RITHM                      the student is selected.
TEST_ALGORITHM_-             The behavior of the current test              text,” Expert Systems with Applications, vol. 34, no. 2, pp.
EVENT                        algorithm changes, e.g. entering              1474–1480, Feb. 2008.
                             another stage of testing.                [5] S.-H. Wu and W.-L. Hsu, “SOAT: a semi-automatic
                                                                           domain ontology acquisition tool from Chinese corpus,” in
ASK_TESTQUESTION             Sends out a test question to the
                             user to answer.                               Proceedings of the 19th international conference on
                                                                           Computational linguistics-Volume 2, 2002, pp. 1–5.
STUDIO_LOGOUT                The user logs out of the Studio          [6] M. Missikoff, R. Navigli, and P. Velardi, “The Usable
                             system.                                       Ontology: An Environment for Building and Assessing a
To store the events, the system implements an additional                   Domain Ontology,” in The Semantic Web — ISWC 2002, I.
logging database, splitting the concepts of the logging to a star-         Horrocks and J. Hendler, Eds. Springer Berlin Heidelberg,
schema for efficient extraction, transformation and loading. The           2002, pp. 39–53.
logging system is modular and easy to extend with new                 [7] T. A. Gavrilova and I. A. Leshcheva, “Ontology design
concepts and easy to attach to potential event positions within            and individual cognitive peculiarities: A pilot study,”
the Studio runtime. Together with the existing logging of the              Expert Systems with Applications, vol. 42, no. 8, pp.
assessment evaluation feedback, this new extension tracks the              3883–3892, May 2015.
exploration of the sub-ontology within the assessment and             [8] S. Sosnovsky and T. Gavrilova, “Development of
enriches the feedback data with context information of the                 Educational Ontology for C-programming,” 2006.
students behavior on the system.                                      [9] V. Psyché, J. Bourdeau, R. Nkambou, and R. Mizoguchi,
                                                                           “Making learning design standards work with an ontology
4. NEXT STEPS                                                              of educational theories,” in 12th Artificial Intelligence in
The domain ontology offers a functional and semantically rich              Education (AIED2005), 2005, pp. 539–546.
core for supporting learning and education. Yet not all the           [10] T. Nodenot, C. Marquesuzaá, P. Laforcade, and C.
semantic potentials are fully leveraged to support and test the            Sallaberry, “Model Based Engineering of Learning
learner’s progress. The “requires_knowledge_of” relation-                  Situations for Adaptive Web Based Educational Systems,”
requirement is a potential start-concept to model sub-areas as             in Proceedings of the 13th International World Wide Web
groups which together compose the dependency. This could act               Conference on Alternate Track Papers &Amp; Posters,
as an additional input for the assessment, where the system                New York, NY, USA, 2004, pp. 94–103.
derives more complex decision how to further explore the              [11] A. Bouzeghoub, C. Carpentier, B. Defude, and F.
related parts of the structure [25]. This could also be visualized,        Duitama, “A model of reusable educational components
enabling the learner to grasp the personal knowledge as a                  for the generation of adaptive courses,” in Proc. First
visible group of concepts.                                                 International Workshop on Semantic Web for Web-Based
Besides giving colors to the different types of relations, the             Learning in conjunction with CAISE, 2003, vol. 3.
visualizing of edges between knowledge areas is yet unfiltered,       [12] W. Chen and R. Mizoguchi, “Learner model ontology and
offering no further support for navigation. A next stage of                learner model agent,” Cognitive Support for Learning-
implementation could be the introduction of a visual ordering              Imagining the Unknown, pp. 189–200, 2004.
and grouping of knowledge areas and relations. Underlying             [13] G. Neusch and A. Gábor, “PROKEX – INTEGRATED
relations of sub-nodes could be interpreted visually through the           PLATFORM FOR PROCESS-BASED KNOWLEDGE
thickness of relations between nodes, easing the perception of             EXTRACTION,” ICERI2014 Proceedings, pp. 3972–
complex parts of the domain ontology, especially within                    3977, 2014.
administration and maintenance tasks.                                 [14] H. L. Roediger III, “Relativity of Remembering: Why the
                                                                           Laws of Memory Vanished,” Annual Review of
The feedback of the current evaluation study of Studio will
                                                                           Psychology, vol. 59, no. 1, pp. 225–254, 2008.
provide additional insights into the usage of the system by the
                                                                      [15] J. D. Fletcher, “Evidence for learning from technology-
students. Based on this new data it is possible to mine profiles
                                                                           assisted instruction,” Technology applications in
over time on the knowledge structure. One major application is
                                                                           education: A learning view, pp. 79–99, 2003.
here the creation of behavior profiles, as proposed in [23].
                                                                      [16] J.-C. Falmagne, M. Koppen, M. Villano, J.-P. Doignon,
                                                                           and L. Johannesen, “Introduction to knowledge spaces:
5. REFERENCES                                                              How to build, test, and search them,” Psychological
[1] D. Dunning, C. Heath, and J. M. Suls, “Flawed self-                    Review, vol. 97(2), pp. 201–224, 1990.
    assessment implications for health, education, and the            [17] F. Neuhaus, E. Florescu, A. Galton, M. Gruninger, N.
    workplace,” Psychological science in the public interest,              Guarino, L. Obrst, A. Sanchez, A. Vizedom, P. Yim, and
    vol. 5, no. 3, pp. 69–106, 2004.                                       B. Smith, “Creating the ontologists of the future,” Applied
[2] D. Jonassen, “Reconciling a Human Cognitive                            Ontology, vol. 6, no. 1, pp. 91–98, 2011.
    Architecture,” in Constructivist Instruction: Success Or          [18] “Ext        JS       API.”       [Online].        Available:
    Failure?, S. Tobias and T. M. Duffy, Eds. Routledge,                   http://www.objis.com/formationextjs/lib/extjs-
    2009, pp. 13–33.                                                       4.0.0/docs/index.html. [Accessed: 04-May-2015].
[3] R. Vas, “Educational Ontology and Knowledge Testing,”             [19] K.-P. Yee, D. Fisher, R. Dhamija, and M. Hearst,
    Electronic Journal of Knowledge Management, vol. 5, no.                “Animated exploration of dynamic graphs with radial
    1, pp. 123 – 130, 2007.                                                layout,” in IEEE Symposium on Information
[4] M. Y. Dahab, H. A. Hassan, and A. Rafea, “TextOntoEx:                  Visualization, 2001. INFOVIS 2001, 2001, pp. 43–50.
    Automatic ontology construction from natural English
[20] M. Arru, “Application of Process Ontology to improve the     [23] H. M. Truong, “Integrating learning styles and adaptive e-
     funding allocation process at the European Institute of           learning system: Current developments, problems and
     Innovation and Technology.,” in 3rd International                 opportunities,” Computers in Human Behavior, 2015.
     Conference on Electronic Government and the                  [24] C. Weber, “Enabling a Context Aware Knowledge-Intense
     Information Systems Perspective (EGOVIS)., Munich,                Computerized Adaptive Test through Complex Event
     Germany, 2014.                                                    Processing,” Journal of the Sientific and Educational on
[21] M., M., Ansari, F., Dornhöfer Khobreh and M. Fathi, “An           Forum on Business Information Systems, vol. 9, no. 9, pp.
     ontology-based Recommender System to Support Nursing              66–74, 2014.
     Education and Training,” in LWA 2013, 2013.                  [25] C. Weber and R. Vas, “Extending Computerized Adaptive
[22] V. Castello, L. Mahajan, E. Flores, M. Gabor, G. Neusch,          Testing to Multiple Objectives: Envisioned on a Case
     I. B. Szabó, J. G. Caballero, L. Vettraino, J. M. Luna, C.        from the Health Care,” in Electronic Government and the
     Blackburn, and F. J. Ramos, “THE SKILL MATCH                      Information Systems Perspective, vol. 8650, A. Kő and E.
     CHALLENGE. EVIDENCES FROM THE SMART                               Francesconi, Eds. Springer International Publishing, 2014,
     PROJECT,” ICERI2014 Proceedings, pp. 1182–1189,                   pp. 148–162.
     2014.
                 Graph Analysis of Student Model Networks

                          Julio Guerra                                             Yun Huang
                 School of Information Sciences                            Intelligent Systems Program
                    University of Pittsburgh                                  University of Pittsburgh
                     Pittsburgh, PA, USA                                        Pittsburgh, PA, USA
                        jdg60@pitt.edu                                          yuh43@pitt.edu

                         Roya Hosseini                                         Peter Brusilovsky
                   Intelligent Systems Program                           School of Information Sciences
                      University of Pittsburgh                              University of Pittsburgh
                        Pittsburgh, PA, USA                                  Pittsburgh, PA, USA
                        roh38@pitt.edu                                          peterb@pitt.edu


ABSTRACT                                                         same for all students and at all times. In this work we ex-
This paper explores the feasibility of a graph-based approach    plore the idea that it might be beneficial for a student model
to model student knowledge in the domain of programming.         to include connections between domain KEs that represent
The key idea of this approach is that programming concepts       some aspects of individual student knowledge rather than
are truly learned not in isolation, but rather in combina-       domain knowledge. This idea is motivated by the recogni-
tion with other concepts. Following this idea, we represent      tion that the mastery in many domains is reached as the
a student model as a graph where links are gradually added       student practices connecting different KEs, i.e., each KE is
when the student’s ability to work with connected pairs of       practiced in conjunction with other KEs. To address this, we
concepts in the same context is confirmed. We also hypothe-      build a model represented as a network of KEs that get pro-
size that with this graph-based approach a number of tradi-      gressively connected as the student successfully works with
tional graph metrics could be used to better measure student     problems and assessment items containing multiple KEs. As
knowledge than using more traditional scalar models of stu-      the student succeeds in more diverse items mapped to dif-
dent knowledge. To collect some early evidence in favor of       ferent KEs, her model gets better connected.
this idea, we used data from several classroom studies to
correlate graph metrics with various performance and moti-       To explore the value of this graph-based representation of
vation metrics.                                                  student knowledge, we compute different graph metrics (e.g.,
                                                                 density, diameter) for each student and analyze them in re-
1.   INTRODUCTION                                                lation to student performance metrics and attitudinal ori-
Student modeling is widely used in adaptive educational sys-     entations drawn from a motivational theory. This analysis
tems and tutoring systems to keep track of student knowl-        was performed using data collected from 3 cohorts of a Java
edge, detect misconceptions, provide targeted support and        programming course using the same system and the same
give feedback to the student [2]. The most typical overlay       content materials. In the remaining part of the paper, we
student model dynamically represents the inferred knowl-         describe related work, introduce and illustrate the suggested
edge level of the student for each knowledge element (KE)        approach, describe graph and performance metrics, and re-
(also called knowledge component or KC) defined in a do-         port the results of the correlation analysis.
main model. These knowledge levels are computed as the
student answers questions or solves problems that are mapped
to the domain KEs. Student models are frequently built over
                                                                 2.   RELATED WORK
networked domain models where KEs are connected by pre-          Graph representation of student activity is not new. The
requisite, is-a, and other ontological relationships that are    2014 version of the Graph-Based Educational Data Mining
used to propagate the knowledge levels and produce a more        Workshop 1 contains two broad types of related work: the
accurate representation of the knowledge of the learner. Since   analysis of the networking interaction among students, for
these connections belong to domain models, they stay the         example work on social capital [14] and social networking in
                                                                 MOOCs [3, 12]; and analyses of learning paths over graph
                                                                 representation of student traces while performing activities
                                                                 in the system [1, 5]. Our work fits in the second type since
                                                                 we model traces of each student interacting with the sys-
                                                                 tem. However, our approach is different as it attempts to
                                                                 combine an underlying conceptual model with the traces of
                                                                 the student learning.

                                                                 1
                                                                   http://ceur-ws.org/Vol-1183/gedm2014_proceedings.
                                                                 pdf
A considerable amount of work focused on graph repre-
sentation of domain models that serve as a basis for over-
lay student models. The majority of this work focused on
constructing the prerequisite relationships between domain
knowledge components (concept, skills) [6, 13]. In this case
links established between a pair of concepts represent prereq-
uisite - outcome relationship. Another considerable stream
of work explored the use of formal ontologies with such re-
lationships as is-a and part-of for connecting domain knowl-
edge components [7]. Ontological representation, in turn,
relates to another stream of work that applies graph tech-
niques to structural knowledge representation, for example
by analyzing the network properties of ontologies [9].                            Figure 1: Exercise jwhile1

The research on graph-based domain models also leads to
a stream of work on using Bayesian networks to model the
relationships between domain concepts for knowledge prop-        path length, and average node weight can be good indicators
agation in the process of student modeling [15, 4]. Yet, in      of student knowledge compared to the amount of activities
both cases mentioned above links between knowledge com-          done or overall measures of assessment like success rate on
ponents were not parts of individual student model, but ei-      exercises. We further explore these graph metrics in relation
ther parts of the domain model or student modeling pro-          with motivational factors drawn from a learning motivation
cess and thus remain the same for all students. In con-          theory.
trast, the approach suggested in this paper adds links be-
tween knowledge components to individual student models          3.1     Domain Model and Content
to express combinations of knowledge components that the
                                                                 Our content corpus is composed by a set of 112 interactive
given student explored in a problem solving or assessment
                                                                 parameterized exercises (i.e., questions or problems) in the
process. This approach is motivated by our belief that in
                                                                 domain of Java programming from our system QuizJet [11].
the programming domain, student knowledge is more effec-
                                                                 Parameterized exercises are generated from a template by
tively modeled by capturing student progress when students
                                                                 substituting a parameter variable with a randomly generated
needed to apply multiple concepts at the same time.
                                                                 value. As a result each exercise can be attempted multiple
                                                                 times. To answer the exercise the student has to mentally
3.   THE APPROACH                                                execute a fragment of Java code to determine the value of a
The idea behind our approach is that knowledge is likely to      specific variable or the content printed on a console. When
be stronger for concepts which are practiced together with       the student answers, the system evaluates the correctness,
a larger variety of other concepts. We hypothesize, for ex-      reports to the student whether the answer was correct or
ample, that a student who solves exercises, in which the         wrong, shows the correct response, and invites the student
concept for-loop is used with post-incremental operator and      to “try again”. As a result, students may still try the same
post-decremental operator will have a better understanding       exercises even after several correct attempts. An example of
of for-loop than another student who practices (even the         parameterized java exercise can be seen in Figure 1.
same amount of times) the for loops concept in a more nar-
row context, i.e., only with post-incremental operator. To       In order to find the concepts inside all of the exercises, we
represent our approach, for each student we build a graph-       used a parser [10] that extracts concepts from the exercise’s
based student model as a network of concepts where the           template code, analyzes its abstract syntax tree (AST), and
edges are created as the student succeeds in exercises con-      maps the nodes of the AST (concepts extracted) to the nodes
taining both concepts to be connected. The Domain Model          in a Java ontology 2 . This ontology is a hierarchy of pro-
defining the concept space and the mapping between the           gramming concepts in the java domain and the parser uses
concepts and programming exercises is explained in the next      only the concepts in the leaf nodes of the hierarchy.
section. The weight of the edges in the graph is computed
as the overall success rate on exercises performed by the        In total there are 138 concepts extracted and mapped to
student which contain the pair of concepts. Pairs of con-        QuizJet exercises. Examples of concepts are: Int Data Type,
cepts that do not co-occur in exercises succeeded by the stu-    Less Expression, Return Statement, For Statement, Subtract
dent are not connected in her graph. In this representation,     Expression, Constant, Constant Initialization Statement, If
highly connected nodes are concepts successfully practiced       Statement, Array Data Type, Constructor Definition, etc.
with different other concepts. We also compute a measure         We excluded 8 concepts which appear in all exercise tem-
of weight for each node by taking the average weight among       plates (for example “Class Definition” or “Public Class Spec-
edges connecting the node. This measure of the success rate      ifier” appear in the first line of all exercises). Each concept
on concepts favors exercises that connect more concepts be-      appears in one or more Java exercises. Each of the 112 ex-
cause each exercise containing n concepts produce or affects     ercises maps to 2 to 47 Java concepts. For example, the
n(n − 1)/2 edges. For example, a success on an exercise hav-     exercise “jwhile1”, shown in Figure 1, is mapped to 5 con-
ing 10 concepts contributes to 45 edges, but a successful at-    cepts: Int Data Type, Simple Assignment Expression, Less
tempt to an exercise connecting 5 concepts only contributes      Expression, While Statement, Post Increment Expression.
to 10 edges. We hypothesize that in a graph built following
                                                                 2
this approach, metrics like average degree, density, average         http://www.sis.pitt.edu/~paws/ont/java.owl
3.2   Graph Metrics                                             Table 1: Correlation between activity measures and grade
To characterize the student knowledge graph we computed         and between graph metrics and grade. * significance at 0.1,
standard graph metrics listed below.                            ** significance at 0.05.
                                                                 Measure of Activity                 Corr. Coeff.    Sig. (p)
   • Graph Density (density): the ratio of the number of         Correct Attempts to Exercises          .046           .553
     edges and the number of possible edges.                     Distinct Correct Exercises             .114           .147
   • Graph Diameter (diameter): length of the longest            Overall Success Rate                   .298          .000**
     shortest path between every pair of nodes.                  Avg. Success Rate on Concepts          .188          .016**
   • Average Path Length (avg.path.len): average among
     the shortest paths between all pairs of nodes.              Graph Metric                        Corr. Coeff.    Sig. (p)
   • Average Degree (avg.degree): average among the              Average Degree                         .150           .055*
     degree of all nodes in an undirected graph.                 Graph Density                          .102            .190
   • Average Node Weight (avg.node.weight): the weight           Graph Diameter                         .147            .081
     of a node is the average of the weight of its edges. We     Average Path Length                    .152           .052*
     then average the weights of all nodes in the graph.         Average Node Weight                    .201          .010**

3.3   Measures of Activity
To measure student activity so that it could be correlated      4. EXPERIMENTS AND RESULTS
with the graph metrics we collected and calculated the fol-     4.1 Dataset
lowing success measures:                                        We collected student data over three terms of a Java Pro-
                                                                gramming course using the system: Fall 2013, Spring 2014,
   • Correct Attempts to Exercises (correct.attempts):          and Fall 2014. Since the system usage was not mandatory,
     total number of correct attempts to exercises. It in-      we want to exclude students who just tried the system while
     cludes repetition of exercises as well.                    likely using other activities (not captured by the system)
   • Distinct Correct Exercises (dist.correct.attempts):        for practicing Java programming. For this we looked at the
     number of distinct exercise attempted successfully.        distribution of distinct exercises attempted and we exclude
   • Overall Success Rate (success.rate): the number            all student below the 1st quartile (14.5 distinct exercises
     of correct attempts to exercises divided by the total      attempted). This left 83 students for our analysis. In to-
     number of attempts.                                        tal these students made 8,915 attempts to exercises. On
   • Average Success Rate on Concepts                           average, students have attempted about 55 (Standard Devi-
     (avg.concept.succ.rate): we compute the success rate       ation=22) distinct exercises while performing an average of
     of each concept as the average success rate of the ex-     107 (SD=92) exercises attempts. On average, students have
     ercises containing the concept. Then we average this       covered about 63 concepts with SD=25 (i.e., succeeded in at
     among all concepts in the domain model.                    least one exercise containing the concept), and have covered
                                                                about 773 concept pairs with SD=772 (i.e., succeeded in at
                                                                least one exercise covering the concept pair.) The average
3.4   Motivational Factors                                      success rate ( #correct attempts
                                                                                                 ) across students is about 69%
We use the revised Achievement-Goal Orientation question-                       #total attempts

naire [8] which contains 12 questions in a 7-point Likert       (SD=11%).
scale. There are 3 questions for each of the 4 factors of
the Achievement-Goal Orientation framework : Mastery -          4.2    Graph Metrics and Learning
Approach, Mastery-Avoidance, Performance-Approach and           We compare graph metrics (Avg. Degree, Graph Density,
Performance-Avoidance. Mastery-Approach goal orien-             Graph Diameter, Avg. Path Length and Avg Node Weight)
tation relates to intrinsic motivation: “I want to learn this   and measures of activity (Correct Attempts to Exercises,
because it is interesting for me”, “I want to master this       Distinct Correct Exercises, Overall Success Rate and Avg.
subject”; Mastery-Avoidance relates to the attitude of          Success Rate on Concepts) by computing the Kendall’s τB
avoid to fail or avoid learning less than the minimum; Per-     correlation of these metrics with respect to the students’
formance -Approach goal orientation stresses the idea of        grade on the programming course. Results are displayed in
having a good performance and relates well with social com-     Table 1.
parison: “I want to perform good in this subject”, “I want to
be better than others here”; and Performance-Avoidance          Surprisingly, the plain Overall Success Rate (which does
oriented students avoid to get lower grades or avoid to per-    not consider concepts disaggregation, nor graph informa-
form worse than other students. The goal orientation of         tion) is better correlated with course grade than any other
a student helps to explain the behavior that the student        measure. Students who succeed more frequently, get in gen-
exposes when facing difficulty, but does not label the final    eral better grades. Interestingly, both the Average Suc-
achievement of the student. For example, if a student is        cess Rate on Concepts and the Average Node Weight
Mastery-Approach oriented, it does not necessarily mean         are both significantly correlated with grade. This last mea-
that the student reached the mastery level of the skill or      sure uses the graph information and presents a slightly bet-
knowledge. In our case, we believe the achievement-goal ori-    ter correlation than the former, which does not consider the
entation of the student can convey the tendency to pursue       graph information.
(or avoid) to solve more diverse (and more difficult) exer-
cises, which contain more heterogeneous space of concepts,      Among the other graph metrics, Average Degree and Av-
thus contribute to form better connected graphs.                erage Path Length are marginally correlated with course
                                                                  which extent the motivational profile of the student explains
                                                                  the graph’s shape. Step-wise regression models were used
                                                                  where the dependent variables are the graph metrics and
                                                                  the independent variables are the motivational factors. We
                                                                  found a significant model of the diameter of the graph (R2 =
                                                                  0.161, F = 6.523, p = 0.006) with the factors Mastery-
                                                                  Avoidance (B = 0.952, p = 0.001) and Mastery-Approach
                                                                  (B = −0.938, p = 0.006). Note the negative coefficient for
                                                                  Mastery-Approach and the positive coefficient for Mastery-
                                                                  Avoidance. As the Achievement-Goal Orientation frame-
      Figure 2: Graph representation of two students.             work suggests, Mastery-Approach oriented students are mo-
                                                                  tivated to learn more, tend to explore more content and do
                                                                  not give up easily when facing difficulties; Mastery-Avoidance
Table 2: Graph metrics, measures of activity and motiva-
                                                                  students, in the other hand, do not cope well with diffi-
tional scores of 2 students.
                                                                  culties and tend to give up. Then, a possible explanation
                                   Student A     Student B        of the results is that, in one hand, students with higher
  Graph Density                      0.077         0.094          Mastery-Approach orientation are more likely to solve diffi-
  Graph Diameter                      2.85          2.00          cult questions which connects more and more distant con-
  Avg. Path Length                    1.77          1.78          cepts which decreases the graph diameter; and on the other
  Avg. Degree                         8.64         10.55          hand, Mastery-Avoidance students avoid difficult exercises
  Avg. Node Weight                    0.49          0.51          containing many concepts, thus making less connections and
  Correct Attempts                     71            83           producing graphs with higher diameters. Correlations be-
  Dist. Correct Exercises              66            61           tween graph metrics and motivational factors confirmed the
  Overall Succ. Rate                  0.82          0.76          relation between Mastery-Avoidance and Graph Diameter
  Avg. Succ.Rate on Concepts          0.50          0.53          (Kendall’s τB = 0.197, p = 0.030). Although these re-
  Mastery-Approach                    0.83          0.78          sults are encouraging, they are not conclusive. For exam-
  Mastery-Avoidance                   0.83          0.56          ple, Mastery-Approach students might just do more work,
  Performance-Approach                 1.0          0.17          not necessarily targeting difficult questions. More analysis
  Performance-Avoidance                1.0           0.0          is needed to deeply explore these issues.
  Grade (%)                           100            97
                                                                  5.   DISCUSSIONS AND CONCLUSIONS
                                                                  In this paper we proposed a novel approach to represent stu-
grade (p values less than 0.1). Although this is a weak ev-       dent model in the form of a dynamic graph of concepts that
idence, we believe that we are in the good track. A higher        become connected when the student succeed in assessment
Average Degree means a better connected graph, thus it            item containing a pair of concepts to be connected. The
follows our idea that highly connected nodes signal more          idea behind this approach is to strengthen the model for
knowledge. Average Path Length is more difficult to in-           those concepts that are applied in more different contexts,
terpret. A higher Average Path Length means a less                i.e., in assessment items containing other different concepts.
connected graph (which contradicts our assumption), but           We applied this approach to data of assessment items an-
also, it can express students reaching more “rear” concepts       swered by real students and analyzed the graph properties
which appear in few more-difficult-exercises and generally        comparing them to several performance measures such as
have longer shortest paths. We think that further explo-          course grade as well as motivational factors. Results showed
ration of metrics among sub-graphs (e.g. a graph for an           that this idea is potentially a good indicator of knowledge
specific topic), and further refinement of the approach to        of the students, but further refinement of the approach is
build edges (e.g. connecting concepts that co-occur close to      needed. We used several measures of the built graphs as
each other in the exercise) could help to clarify these results   descriptors of student knowledge level, and we found that a
                                                                  metric aggregating the success rates of the edges to the level
Figure 2 shows the graphs of 2 students who have similar          of concepts (nodes) is highly correlated to course grade, al-
amount of distinct exercises solved correctly but present dif-    though it does not beat the plain overall success rate of the
ferent graph metrics and motivational profile. See metrics in     student in assessment items.
Table 2. Student B has more edges, lower diameter, higher
density, higher degree, solved less questions more times. Stu-    In the future work, we plan to repeat our analysis using
dent A presents a less connected graph although she he/she        more reliable approaches to construct the knowledge graph.
solved more distinct questions (66 compared to 61 on Stu-         One idea is to use rich information provided by the parser
dent B). Student B has lower Mastery-Avoidance orientation        (mapping between exercises and concepts) to ensure that
score and lower Performance orientation scores than Student       each new link connects concepts that interact considerably
A, which could explain why Student B work result in a bet-        in the program code. This could be done by controlling
ter connected graph. Analyses of Motivational factors are         the concepts proximity in the question code (e.g. only con-
described in the following section.                               sider co-occurrence when concepts are close to each other
                                                                  in the parser tree.) Another approach to keep more reliable
4.3    Metrics and Motivation                                     edges is to consider only a subset of important concepts
We now explore the relationship between motivational fac-         for each problem using feature selection techniques. Also
tors and the graphs of the students. The idea is to see to        we plan to perform analyses of sub-graphs targeting specific
“zones” of knowledge. For example, a partial graph with          [7] P. Dolog, N. Henze, W. Nejdl, and M. Sintek. The
only concepts that belongs to a specific topic, or concepts          personal reader: Personalizing and enriching learning
that are prerequisites of a specific concept. Another inter-         resources using semantic web technologies. In
esting idea relates to recommendation of content: guide the          P. De Bra and W. Nejdl, editors, Adaptive Hypermedia
                                                                     and Adaptive Web-Based Systems, volume 3137 of
student to questions that will connect the isolated parts of         Lecture Notes in Computer Science, pages 85–94.
the knowledge graph or minimize the average path length of           Springer Berlin Heidelberg, 2004.
the graph. Along the same lines, the analysis of the graph       [8] A. J. Elliot and K. Murayama. On the measurement
shortest paths and overall connectivity can help in designing        of achievement goals: Critique, illustration, and
assessment items that better connect distant concepts.               application. Journal of Educational Psychology,
                                                                     100(3):613, 2008.
                                                                 [9] B. Hoser, A. Hotho, R. Jäschke, C. Schmitz, and
6.   REFERENCES                                                      G. Stumme. Semantic network analysis of ontologies.
 [1] N. Belacel, G. Durand, and F. Laplante. A binary                Springer, 2006.
     integer programming model for global optimization of       [10] R. Hosseini and P. Brusilovsky. Javaparser: A
     learning path discovery. In Workshop on Graph-Based             fine-grain concept indexing tool for java exercises. In
     Educational Data Mining.                                        The First Workshop on AI-supported Education for
 [2] P. Brusilovsky and E. Millán. User models for                  Computer Science (AIEDCS 2013), pages 60–63, 2013.
     adaptive hypermedia and adaptive educational               [11] I.-H. Hsiao, S. Sosnovsky, and P. Brusilovsky. Guiding
     systems. In P. Brusilovsky, A. Kobsa, and W. Nejdl,             students to the right questions: adaptive navigation
     editors, The Adaptive Web, volume 4321 of Lecture               support in an e-learning system for java programming.
     Notes in Computer Science, chapter 1, pages 3–53.               Journal of Computer Assisted Learning,
     Springer Berlin Heidelberg, Berlin, Heidelberg, 2007.           26(4):270–283, 2010.
 [3] V. Cateté, D. Hicks, C. Lynch, and T. Barnes.             [12] S. Jiang, S. M. Fitzhugh, and M. Warschauer. Social
     Snag’em: Graph data mining for a social networking              positioning and performance in moocs. In Workshop
     game. In Workshop on Graph-Based Educational Data               on Graph-Based Educational Data Mining, page 14.
     Mining, volume 6, page 10.                                 [13] T. Käser, S. Klingler, A. G. Schwing, and M. Gross.
 [4] C. Conati, A. Gertner, and K. Vanlehn. Using                    Beyond knowledge tracing: Modeling skill topologies
     bayesian networks to manage uncertainty in student              with bayesian networks. In Intelligent Tutoring
     modeling. User modeling and user-adapted interaction,           Systems, pages 188–198. Springer, 2014.
     12(4):371–417, 2002.                                       [14] V. Kovanovic, S. Joksimovic, D. Gasevic, and
 [5] R. Dekel and Y. Gal. On-line plan recognition in                M. Hatala. What is the source of social capital? the
     exploratory learning environments. In Workshop on               association between social network position and social
     Graph-Based Educational Data Mining.                            presence in communities of inquiry. 2014.
 [6] M. C. Desmarais and M. Gagnon. Bayesian student            [15] E. Millán, T. Loboda, and J. L. Pérez-de-la Cruz.
     models based on item to item knowledge structures.              Bayesian networks for student model engineering.
     Springer, 2006.                                                 Computers & Education, 55(4):1663–1683, 2010.
 Graph-based Modelling of Students’ Interaction Data from
          Exploratory Learning Environments

          Alexandra Poulovassilis              Sergio Gutierrez-Santos                  Manolis Mavrikis
             London Knowledge Lab                 London Knowledge Lab                London Knowledge Lab
            Birkbeck, Univ. of London            Birkbeck, Univ. of London           UCL Institute of Education
              ap@dcs.bbk.ac.uk                  sergut@dcs.bbk.ac.uk                 m.mavrikis@lkl.ac.uk


ABSTRACT                                                         learning of algebraic generalisation [15]; and the iTalk2Learn
Students’ interaction data from learning environments has        system that aims to support 8-10 year old students’ learning
an inherent temporal dimension, with successive events be-       of fractions [7]. Both systems provide students with math-
ing related through the “next event” relationship. Exploratory   ematical microworlds in which they undertake construction
learning environments (ELEs), in particular, can generate        tasks: in MiGen creating 2-dimensional tiled models using
very large volumes of such data, making their interpretation     a tool called eXpresser and in iTalk2learn creating fractions
a challenging task. Using two mathematical microworlds           using the FractionsLab tool. In eXpresser, tasks typically re-
as exemplars, we illustrate how modelling students’ event-       quire the construction of several models, moving from spe-
based interaction data as a graph can open up new querying       cific models involving specific numeric values to a general
and analysis opportunities. We demonstrate the possibilities     model involving the use of one or more variables; in parallel,
that graph-based modelling can provide for querying and          students are asked to formulate algebraic rules specifying
analysing the data, enabling investigation of student-system     the number of tiles of each colour that are needed to fully
interactions and leading to the improvement of future ver-       colour their models. In FractionsLab, tasks require the con-
sions of the ELEs under investigation.                           struction, comparison and manipulation of fractions, and
                                                                 students are encouraged to talk aloud about aspects of their
                                                                 constructions, such as whether two fractions are equivalent.
Keywords
Exploratory Learning Environments, Interaction Data, Graph       Both systems include intelligent components that provide
Modelling                                                        different levels of feedback to students, ranging from un-
                                                                 solicited prompts and nudges, to low-interruption feedback
1.   INTRODUCTION                                                that students can choose to view if they wish. The aim
Much recent research has focussed on Exploratory Learn-          of this feedback is to balance students’ freedom to explore
ing Environments (ELEs) which encourage students’ open-          while at the same time providing sufficient support to en-
ended interaction within a knowledge domain, coupled with        sure that learning is being achieved [9]. The intelligent sup-
intelligent techniques that aim to provide pedagogical sup-      port is designed through detailed cognitive task analysis and
port to ensure students’ productive interaction [9]. The data    Wizard-of-Oz studies [13], and it relies on meaningful indica-
gathered from students’ interactions with such ELEs pro-         tors being detected as students are undertaking construction
vides a rich source of information for both pedagogical and      tasks. Examples of such indicators in MiGen are ‘student
technical research, to help understand how students are us-      has made a building block’ (part of a model), ‘student has
ing the ELE and how the intelligent support that it provides     unlocked a number’ (i.e. has created a variable), ‘student
may be enhanced to better support students’ learning.            has unlocked too many numbers for this task’; while ex-
                                                                 amples of such indicators in FractionsLab are ‘student has
In this paper, we consider how modelling students’ event-        created a fraction’, ‘student has changed a fraction’ (numer-
based interaction data as a graph makes possible graph-          ator or denominator), ‘student has released a fraction’ (i.e.
based queries and analyses that can provide insights into        has finished changing it).
the ways that students are using the affordances of the sys-
tem and the effects of system interventions on students’ be-     Teacher Assistance tools can subscribe to receive real-time
haviour. Our case studies are two intelligent ELEs: the          information relating to occurrences of indicators for each
MiGen system, that aims to foster 11-14 year old students’       student, and can present aspects of this information visu-
                                                                 ally to the teacher [8]. Indicators are either task independent
                                                                 (TI) or task dependent (TD). The former refer to aspects of
                                                                 the student’s interaction that are related to the microworld
                                                                 itself and do not depend on the specific task the student is
                                                                 working on, while the latter require knowledge of the task
                                                                 the student is working on, may relate to combinations of
                                                                 student actions, and their detection requires intelligent rea-
                                                                 soning to be applied (a mixture of case-based, rule-based and
                                                                 probablistic techniques). Detailed discussions of MiGen’s TI
and TD indicators and how the latter are inferred may be                Event                        EventType
found in [8].                                                         dateTime                       eventID
                                                                      taskID                         eventStatus
                                                                                      occurrenceOf
In this paper we explore how graph-based representation of            constrID                       eventCat
                                                                      userID
event-based interaction data arising from ELEs such as Mi-            sessionID
Gen and FractionsLab can aid in the querying and analysis
of such data, with the aim of exploring both the behaviours                          next
of the students in undertaking the exploratory learning tasks
set and the effectiveness of the intelligent support being
provided by the system to the students. Data relating to
                                                                                  Figure 1: Core Graph Data Model
learning environments has often been modelled as a graph
in previous work, for example in [10] for providing support
to moderators in e-discussion environments; in [16, 18] for        There is a relationship ‘next’ linking an instance of Event
supporting learning of argumentation; in [17] for modelling        to the next Event that occurs for the same user, task and
data and metadata relating to episodes of work and learning        session. There is a relationship ‘occurrenceOf’ linking each
in a lifelong learning setting; in [1] for learning path discov-   instance of Event to an instance of the EventType class.
ery as students “navigate” through learning objects; in [3] for
recognising students’ activity planning in ELEs; and in [23]       The instances of the EventType class include: startTask,
for gaining better understanding of learners’ interactions and     endTask, numberCreated, numberUnlocked, unlockedNum-
ties in professional networks.                                     berChanged, buildingBlockMade, correctModelRuleCreated,
                                                                   incorrectModelRuleCreated, interventionGenerated, interven-
Previous work that is close to ours is the work on interac-        tionShown, in the case of eXpresser (see [8] for the full list);
tion networks and hint generation [6, 21, 20, 4, 5], in which      and startTask, endTask, fractionCreated, fractionChanged,
the graphs used consist of nodes representing states within a      fractionReleased, inverventionShown, in the case of Frac-
problem-solving space and edges representing students’ ac-         tionsLab.
tions in transitioning between states. This approach targets
learning environments where students are required to select        We see that instances of the EventType class have several
and apply rules, and the interaction network aims to rep-          attributes, including:
resent concisely information relating to students’ problem-
solving sequences in moving from state to state. Our focus
here differs from this in that we are using graphs to model           • eventID: a unique numerical identifier for each type of
fine-grained event-based interaction data arising from ELEs.            indicator;
In our graphs, nodes are used to represent indicator occur-
rences (i.e. events, not problem states) and edges between            • eventStatus: this may be -1, 0 or 1, respectively stat-
such nodes represent the “next event” relationship. Also,               ing that an occurrence of this type of indicator shows
rather than using the information derived from querying and             that the student is making negative, neutral or posi-
analysing this data to automatically generate hints, our fo-            tive progress towards achieving the task goals; an addi-
cus is on investigating how students are using the system               tional status 2 is used for indicators relating to system
and the effects of the system’s interventions in order to un-           interventions;
derstand how students interact with the ELEs and improve
their future versions.                                                • eventCat: the category into which this indicator type
                                                                        falls; for example, startTask and endTask are task-
                                                                        related indicators; interventionGenerated and inter-
2.   GRAPH-BASED MODELLING                                              ventionShown are system-related ones; numberCreated,
Figure 1 illustrates our Graph Data Model for ELE interac-              numberUnlocked, unlockedNumberChanged are number-
tion data. We see two classes of nodes: Event — represent-              related; and fractionCreated, fractionChanged, frac-
ing indicator occurrences; and EventType — representing                 tionReleased are fraction-related.
different indicator types. The instances of the Event class
are occurrences of indicators that are detected or generated       Figure 2 shows a fragment of MiGen interaction data con-
by the system as each student undertakes a task. We see            forming to this graph data model. Specifically, it relates to
that instances of Event have several attributes: dateTime:         the interactions of user 5 as he/she is working on task 2
the date and time of the indicator occurrence; userID: the         during session 9. The user makes three constructions during
student it relates to; sessionID: the class session that the       this task (with constrIDs 1, 2 and 3). The start and end
student was participating in at the time; taskID: the taskID       of the task are delimited by an occurrence of the startTask
that the student was working on; and constrID: the con-            and endTask indicator type, respectively — events 23041
struction that the student was working on1 .                       and 33154. We see that the two events following 23041 re-
                                                                   late to an intervention being generated and being shown to
1
 The model in Fig. 1 focusses on the interaction data. The         the student (this is likely to be because the student was in-
full data relating to ELEs such as eXpresser and Fraction-         active for over a minute after starting the task); following
sLab would also include classes relating to users, tasks, ses-     which, the student creates a number — event 24115.
sions and constructions; and attributes describing instances
of these classes, such as a user’s name and year-group, a
task’s name and description, a construction’s content and          There are additional attributes relating to events, not shown
description, and a session’s description and duration.             here for simplicity, capturing values relating to the student’s
            23041                                                                                                                        344712
                                                   startTask                                                                                                                    startTask
           dateTime:           occurrenceOf                                                                                             dateTime:           occurrenceOf
           20150331091524                         eventID:0                                                                             20150215091741                         eventID:0
           taskID:2                               eventStatus:0                                                                         taskID:56                              eventStatus:0
                                                  eventCat:taskEv                                                         next                                                 eventCat:taskEv
           constrID:1                                                                                                                   constrID:4
           userID:5                                                                                                                     userID:5
           sessionID:9                                                                                                                  sessionID:1
                                                        intervention-                   intervention-
  next                                                  Generated                       Shown
                                                                                                                               ...                                                  fractionChanged             fractionReleased
              23921                                                                                                                        344758
                                 occurrenceOf         eventID:6001                     eventID:6002                                                           occurrenceOf         eventID:1002                eventID:1003
            dateTime:                                 eventStatus:2                    eventStatus:2                                     dateTime:                                 eventStatus:1               eventStatus:1
            20150331091637                            eventCat:systemEv                eventCat:systemEv                   next          20150215091828                            eventCat:sfractionEv        eventCat:fractionEv
            taskID:2                                                                                                                     taskID:56
            constrID:1                                                                                                                   constrID:4
            userID:5                                                                                                                     userID:5
            sessionID:9                       occurrenceOf                                     numberCreated                             sessionID:1                       occurrenceOf                                interventionShown

                                                                                               eventID:1006                                                                                                            eventID:6002
    next                                                                                       eventStatus:1                     next                                                        occurrenceOf              eventStatus:2
               23923                                                occurrenceOf               eventCat:numberEv                            344759                                                                     eventCat:systemEv
              dateTime:                   24115                                                                                            dateTime:                   344760
              20150331091638            dateTime:                              33154                                                       20150215091828            dateTime:                        344761
              taskID:2                  20150331091655                                                                                     taskID:56                 20150215091832
              constrID:1                                                      dateTime:         occurrenceOf                               constrID:4                                              dateTime:
                                        taskID:2                              20150331094453                                                                         taskID:56                     20150215091833   occurrenceOf
              userID:5                  constrID:1                                                                                         userID:5                  constrID:4
              sessionID:9        next   userID:5
                                                                        ...   taskID:2                  endTask                            sessionID:1        next   userID:5
                                                                                                                                                                                                   taskID:56                       clickButton
                                                                              constrID:3                eventID:9999                                                                               constrID:4                      eventID:3002
                                        sessionID:9            next           userID:5                                                                               sessionID:1            next   userID:5
                                                                                                        eventStatus:0                                                                                                              eventStatus:0
                                                                              sessionID:9               eventCat:taskEv                                                                            sessionID:1                     eventCat:taskEv




                Figure 2: Fragment of Graph Data                                                                                             Figure 3: Fragment of Graph Data


                                                                                                                          of a node would be represented by an edge and its value by
constructions and information relating to the system’s in-                                                                a literal-valued node. So, for example, the information that
terventions. For example, for event 24115, the value of the                                                               the taskID of event 23041 is 2 would be represented by an
number created, say 5; for event 23921, the feedback strat-                                                                            taskID
                                                                                                                          edge 23041 −−−−−→ 2. The query examples in the next section
egy used by the system to generate this intervention, say                                                                 assume this “classical” graph representation.
strategy 8; and for event 23923, the content of the message
displayed to the user, say “How many green tiles do you need
to make your pattern?” and whether this is a high-level in-
                                                                                                                          3.         GRAPH QUERIES AND ANALYSES
terruption by the system or a low-level interruption that                                                                 Because the sub-graph induced by edges labelled ‘next’ con-
the student can choose to view or not. Such information                                                                   sists of a set of paths, the data readily lends itself to explo-
can be captured through additional edges outgoing from an                                                                 ration using conjunctive regular path (CRP) queries [2]. A
                                               value
event instance to a literal-valued node: 24115 −−−→ 5, 23921                                                              CRP query, Q, consisting of n conjuncts is of the form
strategy         message
−−−−−−→ 8, 23932 −−−−−− → “How many green tiles do you need                                                                             (Z1 , . . . , Zm ) ← (X1 , R1 , Y1 ), . . . , (Xn , Rn , Yn )
                                level
to make your pattern?”, 23932 −−−→ “high”. Since graph data
                                                                                                                          where each Xi and Yi is a variable or a constant, each Zi is
models are semi-structured (and graph data therefore does
                                                                                                                          a variable that appears also in the right hand side of Q, and
not need to strictly conform to a single schema), this kind
                                                                                                                          each Ri is a regular expression over the set of edge labels.
of heterogeneity in the data is readily accommodated.
                                                                                                                          In this context, a regular expression, R, has the following
                                                                                                                          syntax:
Figure 3 similarly shows a fragment of FractionsLab inter-
action data, relating to the interactions of user 5 working                                                                                   R :=  | a | | (R1 .R2 ) | (R1 |R2 ) | R∗ | R+
on task 56 during session 1. The user makes one construc-
                                                                                                                          where  denotes the empty string, a denotes an edge label,
tion during this task. We see events relating to the stu-
                                                                                                                          denotes the disjunction of all edge labels, and the operators
dent changing and ‘releasing’ a fraction. Following which
                                                                                                                          have their usual meaning. The answer to a CRP query on a
the system displays a message (in this case, it was a high-
                                                                                                                          graph G is obtained by finding for each 1 ≤ i ≤ n a binary
interruption message of encouragement “Great! Well Done”).
                                                                                                                          relation ri over the scheme (Xi , Yi ), where there is a tuple
                                                                                                                          (x, y) in ri if and only if there is a path from x to y in G
We see from Figures 2 and 3 that the sub-graph induced by
                                                                                                                          such that: x = Xi if Xi is a constant; y = Yi if Yi is a
edges labelled ‘next’ consists of a set of paths, one path for
                                                                                                                          constant; and the concatenation of the edge labels in the
each task undertaken by a specific user in a specific session.
                                                                                                                          path satisfies the regular expression Ri . The answer is then
The entire graph is a DAG (directed acyclic graph): there
                                                                                                                          given by forming the natural join of the binary relations
are no cycles induced by the edges labelled ‘next’ since each
                                                                                                                          r1 , . . . , rn and finally projecting on Z1 , . . . , Zm .
links an earlier indicator occurrence to a later one; while
the instances of EventType and other literal-valued nodes
                                                                                                                          To illustrate, the following CRP query returns pairs of events
can have only incoming edges. The entire graph is also a
                                                                                                                          x, y such that x is an intervention message shown to the user
bipartite graph, with the two parts comprising (i) the in-
                                                                                                                          by the system and y indicates that the user’s next action –
stances of Event, and (ii) the instances of EventType and
                                                                                                                          in eXpresser – was to create a number (note, variables in
the literal-valued nodes.
                                                                                                                          queries are distinguished by an initial question mark):
As a final observation, we note that Figures 1 – 3 adopt
a “property graph” notation (e.g. as used in the Neo4J                                                                    (?X,?Y) <- (?X,occurrenceOf,interventionShown),
graph database, neo4j.com) in which nodes may have at-                                                                               (?X,next,?Y),
tributes. In a “classical” graph data model, each attribute                                                                          (?Y,occurrenceOf,numberCreated)
The result would contain pairs such as (23923,24115) from          (?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated),
Figure 2, demonstrating that there are indeed situations                         (?X,constrID,?C), (?X,next+,?Y),
where an intervention message displayed by the MiGen sys-                        (?Y,constrlID,?C), (?Y,occurrenceOf,?Z)
tem leads directly to the creation of a number by the student.

The following query returns pairs of events x, y such that         The result would contain triples such as
that x is an intervention message shown to the user by the         (23921, 23923, interventionShown),
system and y is the user’s next action; the type of y is also      (23921, 24115, numberCreated),
returned, through the variable ?Z:                                 (23921, 24136, numberUnlocked),
                                                                   (23921, 24189, unlockedNumberChanged),
                                                                   relating to construction 1 made by user 5 during session 9
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionShown),                 for task 2 (two more events — 24136 and 24189 — relat-
              (?X,next,?Y),                                        ing to construction 1 have been assumed here, in addition
              (?Y,occurrenceOf,?Z)                                 to 23923 amd 24115 shown in Figure 2, for illustrative pur-
                                                                   poses). The results would not contain (23921,33154,end-
The result would contain triples such as (23923,24115,num-         Task), since event 33154 relates to construction 3.
berCreated) from Figure 2 and (344760,344761,clickButton)
from Figure 3, allowing researchers to see what types of           To show more clearly the answers to the previous query in
events directly follow the display of an intervention mes-         the form of possible event paths, we can use extended regular
sage. This would allow the confirmation or contradiction of        path (ERP) queries [11], in which a regular expression can
researchers’ expectations regarding the immediate effect of        be associated with a path variable and path variables can
intervention messages on students’ behaviours.                     appear in the left-hand-side of queries. Thus, for example,
                                                                   the following query returns the possible paths from x to y:
Focussing for the rest of this section on the data in Figure 2,
the following query returns pairs of events x, y such that x is
                                                                   (?X,?P,?Y,?Z) <-
any type of event and y indicates that the user’s next action
                                                                         (?X,occurrenceOf,interventionGenerated),
was to unlock a number; the type of x is also returned,
                                                                         (?X,constrID,?C), (?X,next+:?P,?Y),
through the variable ?Z:
                                                                         (?Y,constrID,?C), (?Y,occurrenceOf,?Z)

(?X,?Y,?Z) <- (?X,occurrenceOf,?Z),
              (?X,next,?Y),                                        The result would contain answers such as
              (?Y,occurrenceOf,numberUnlocked)                     (23921, [next], 23923, interventionShown),
                                                                   (23921, [next, 23923, next], 24115, numberCreated),
                                                                   (23921, [next, 23923, next, 24115, next], 24136, numberUn-
The result would allow researchers to see what types of
                                                                   locked),
events immediately precede the unlocking of a number (i.e.
                                                                   (23921, [next, 23923, next, 24115, next, 24136, next], 24189,
the creation of a variable). This would allow confirmation
                                                                   unlockedNumberChanged).
of researchers’ expectations about the design of the MiGen
system’s intelligent support in guiding students towards gen-
                                                                   The use of the regular expressions next and next+ in the
eralising their models by changing a fixed number to an ‘un-
                                                                   previous queries matches precisely one edge labelled ‘next’,
locked’ one.
                                                                   or any number of such edges (greater than or equal to 1),
                                                                   respectively. However, for finer control and ranking of query
The following query returns pairs of events x, y such that
                                                                   answers, it is possible to use approximate answering of CRP
that x is an intervention generated by the system and y is
                                                                   and ERP queries (see [11, 17]), in which edit operations such
any subsequent event linked to x through a path comprising
                                                                   as insertion, deletion or substitution of an edge label can be
one or more ‘next’ edges; the type of y is also returned,
                                                                   applied to regular expressions.
through the variable ?Z:
                                                                   For example, using the techniques described in [11, 17], the
(?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated),             user can chose to allow the insertion of the label ‘next’ into
              (?X,next+,?Y),                                       a regular expression, at an edit cost of 1. Submitting then
              (?Y,occurrenceOf,?Z)                                 this query:

The result would contain triples such as (23921, 23923, inter-     (?X,?P,?Y,?Z) <-
ventionShown), (23921, 24115, numberCreated), ... (23921,                (?X,occurrenceOf,interventionGenerated),
33154, endTask), allowing researchers to see what types of               (?X,constrID,?C), APPROX(?X,next:?P,?Y),
events directly or indirectly follow the display of an interven-         (?Y,constrID,?C), (?Y,occurrenceOf,?Z)
tion message by the system. This would allow the confirma-
tion or contradiction of researchers’ expectations regarding
the longer-term effect of intervention messages on students’       would return first exact answers, such as
behaviours.                                                        (23921, [next], 23923, interventionShown). The regular ex-
                                                                   pression next in the conjunct APPROX(?X,next:?P,?Y) would
We can modify the query to retain only pairs x, y that relate      then be automatically approximated to next.next, leading
to the same construction:                                          to answers such as
                                                                                                   Transition Matrix(Session 3)




(23921, [next, 23923, next], 24115, numberCreated)                                                        6003      e      s
                                                                                                   6002                          1001
at an edit distance of 1 from the original query. Following                                 6001                                        1002

this, the regular expression next.next would be automati-                            5004                                                      1003

cally approximated to next.next.next, leading to answers                       5003                                                                 1004

such as                                                                      5002                                                                     1005
(23921, [next, 23923, next, 24115, next], 24136, numberUn-
locked)                                                                      5001                                                                      1006


at distance 2. This incremental return of paths of increas-                   3009                                                                    1007

ing length can continue for as long as the user wishes, and                     3008                                                            1008

allows researchers to examine increasingly longer-term ef-                             3007                                                  1009

fects of intervention messages on students’ behaviours. It                                    3006                                    1010
                                                                                                     3002                      1011
would also be possible for users to specify from the outset a                                                1015       1014


minimum and maximum edit distance to be used in approx-                                                                                               0.96342
                                                                                                                                                      0.00024384

imating and evaluating the query, for example to request
paths encompassing between 2 and 4 edges labelled ‘next’.
                                                                        Figure 4: Transitions between Event Types
Queries based on evaluating regular expressions over a graph-
based representation of interaction data, such as those above,
can aid in the exploration of students’ behaviours as they are           applied across a whole dataset, or focussing on partic-
undertaking tasks using ELEs and the effectiveness of the                ular students, tasks or sessions;
intelligent support being provided by the ELE. The query
                                                                       • nodes that have a high probability of being visited on a
processing techniques employed are based on incremental
                                                                         randomly chosen shortest path between two randomly
query evaluation algorithms which run in polynomial time
                                                                         chosen nodes have high betweenness centrality; deter-
with respect to the size of the database graph and the size
                                                                         mining this measure for pairs of event type nodes (ig-
of the query and which return answers in order of increasing
                                                                         noring the directionality of the ‘occurrenceOf’ edges)
edit distance [11]. A recent paper [19] gives details of an
                                                                         would identify event types that play key mediating
implementation, which is based on the construction of an
                                                                         roles between other event types.
automaton (NFA) for each query conjunct, the incremental
construction of a weighted product automaton from each
conjunct’s automaton and the data graph, and the use of           We have already undertaken some ad hoc analyses of in-
a ranked join to combine answers being incrementally pro-         teraction data arising from classroom sessions using ELEs.
duced from the evaluation of each conjunct. The paper also        For example, Figure 4 shows the normalised incoming tran-
presents a performance study undertaken on two data sets          sitions for a 1-hour classroom session involving 22 students
— lifelong learning data and metadata [17] and YAGO [22].         using MiGen (in the diagram, s denotes the ‘startTask’ and
The first of these has rather ‘linear’ data, similar to the in-   e the ‘endTask’ event types). Event types with an adja-
teraction data discussed here, while the second has ‘bushier’     cent circle show transitions where this type of event occurs
connectivity. Query performance is generally better for the       repeatedly in succession. The thickness of each arrow or
former than the latter, and the paper discusses several pos-      circle indicates the value of the transition probability: the
sible approaches towards query optimisation.                      thicker the line, the higher the probability. Red (light grey)
                                                                  is used for probabilities < 0.2 and black for probabilities
In addition to evaluating queries over the interaction data,      ≥ 0.2. We can observe a black arrow 3007 → 1005, indicat-
by representing the data in the form of a graph it is possible    ing transitions from events of type 3007 (detection by the
to apply graph structure analyses such as the following:          system that the student has made an implausible building
                                                                  block for this task) to events of type 1005 (modification of a
   • path finding and clustering: this would be useful for        rule by the student). Such an observation raises a hypoth-
     determining patterns of interest across a whole dataset,     esis for more detailed analysis or further student observa-
     or focussing on particular students, tasks or sessions       tion, namely: “does the construction of an incorrect building
     c.f. [4];                                                    block lead students to self-correct their rules?”. Developing
                                                                  a better understanding of such complex interaction can lead
   • average path length: this would be useful for determin-      to improvement of the system. For this particular example,
     ing the amount of student activity (i.e. the number of       we designed a new prompt that suggests to students to first
     indicator occurrences being generated per task) across       consider the building block against the given task before
     a whole dataset, or focussing on particular students,        proceeding unnecessarily in correcting their rules. More ex-
     tasks or sessions;                                           amples of such ad hoc analyses are given in [14]. Represent-
                                                                  ing the interaction data in graph form will allow more sys-
   • graph diameter: to determine the greatest distance be-       tematic, flexible and scalable application of graph-structure
     tween any two nodes (which, due to the nature of the         algorithms such as those identified above.
     data, would be event type nodes); this would be an in-
     dication the most long-running and/or most intensive
     task(s);                                                     4.    CONCLUSIONS AND FUTURE WORK
                                                                  We have presented a graph model for representing event-
   • degree centrality: determining the in-degree centrality      based interaction data arising from Exploratory Learning
     of event type nodes would identify key event types oc-       Environments, drawing on the data generated when students
     curring in students’ interactions; this analysis could be    undertake exploratory learning tasks with the eXpresser and
FractionsLab microworlds. Although developed in the con-                networks: Generating high level hints based on
text of these systems, the model is a very general one and              network community clustering. EDM, 2012.
can easily be used or extended to model similar data from           [7] B. Grawemeyer and et al. Light-bulb moment?:
other ELEs.                                                             Towards adaptive presentation of feedback based on
                                                                        students’ affective state. IUI, pages 400–404, 2015.
We have explored the possibilities that evaluating regular          [8] S. Gutierrez-Santos, E. Geraniou, D. Pearce-Lazard,
path queries over this graph-based representation might pro-            and A. Poulovassilis. Design of Teacher Assistance
vide for exploring the behaviours of students as they are               Tools in an Exploratory Learning Environment for
working in the ELE and the effectiveness of the intelligent             Algebraic Generalization. IEEE Trans. Learn. Tech.,
support that it provides to them. We have also identified               5(4):366–376, 2012.
additional graph algorithms that may yield further insights         [9] S. Gutierrez-Santos, M. Mavrikis, and G. D. Magoulas.
about learners, tasks and significant indicators.                       A Separation of Concerns for Engineering Intelligent
                                                                        Support for Exploratory Learning Environments. J.
Planned worked includes transformation and uploading of                 Research and Practice in Inf. Tech., 44:347–360, 2013.
the interaction data sets gathered during trials and full class-   [10] A. Harrer, R. Hever, and S. Ziebarth. Empowering
room sessions of the two systems into an industrial-strength            researchers to detect interaction patterns in
graph database such as Neo4J, following the graph model                 e-collaboration. Frontiers in Artificial Intelligence and
presented in Section 2; followed by the design, implemen-               Applications, 158:503, 2007.
tation and evaluation of meaningful queries, analyses and
                                                                   [11] C. Hurtado, A. Poulovassilis, and P. Wood. Finding
visualisations over the graph data, building on the work
                                                                        top-k approximate answers to path queries. FQAS,
presented in Section 3. Equipped with an appropriate user
                                                                        pages 465–476, 2009.
interface, educational researchers, designers or even teach-
ers with less technical expertise could in this way explore        [12] K. Koedinger and et al. A data repository for the
the data from their perspective. This has the potential to              EDM community: The PSLC datashop. Handbook of
lead to an improved understanding of interaction in this                Educational Data Mining, 43, 2010.
context and to feed back to the design of the ELEs. We             [13] M. Mavrikis and S. Gutierrez-Santos. Not all Wizards
see this approach very much in the spirit of “polyglot per-             are from Oz: Iterative design of intelligent learning
sistence” (i.e. using different data storage methods to ad-             environments by communication capacity tapering.
dress different data manipulation problems), and hence be-              Computers and Education, 54(3):641–651, 2010.
ing used in conjunction with other EDM resources such as           [14] M. Mavrikis, Z. Zheng, S. Gutierrez-Santos, and
DataShop [12]. Another direction of research is investigation           A. Poulovassilis. Visualisation and analysis of
of how the flexible querying processing techniques for graph            students’ interaction data in exploratory learning
data (including both query approximation and query relax-               environments. Workshop on Web-Based Technology
ation) that have been developed in the context of querying              for Training and Education (at WWW), 2015.
lifelong learners’ data and metadata [11, 17] might be ap-         [15] R. Noss and et al. The design of a system to support
plied or adapted to the much finer-granularity interaction              exploratory learning of algebraic generalisation.
data described here and the more challenging pedagogical                Computers and Education, 59(1):63–82, 2012.
setting of providing effective intelligent support to learners     [16] N. Pinkwart and et al. Graph grammars: An ITS
undertaking exploratory tasks in ELEs.                                  technology for diagram representations. FLAIRS,
                                                                        pages 433–438, 2008.
Acknowledgments                                                    [17] A. Poulovassilis, P. Selmer, and P. Wood. Flexible
                                                                        querying of lifelong learner metadata. IEEE Trans.
This work has been funded by the ESRC/EPSRC MiGen
                                                                        Learn. Tech., 5(2):117–129, 2012.
project, the EU FP7 projects iTalk2Learn (#318051) and
M C Squared (#610467). We thank all the members of                 [18] O. Scheuer and B. McLaren. CASE: A configurable
these projects for their help and insights.                             argumentation support engine. IEEE Trans. Learn.
                                                                        Tech., 6(2):144–157, 2013.
                                                                   [19] P. Selmer, A. Poulovassilis, and W. P.T. Implementing
5.   REFERENCES                                                         flexible operators for regular path queries. GraphQ
 [1] N. Belacel, G. Durand, and F. LaPlante. A binary                   (EDBT/ICDT Workshops), pages 149–156, 2015.
     integer programming model for global optimization of          [20] V. Sheshadri, C. Lynch, and T. Barnes. InVis: An
     learning path discovery. G-EDM, 2014.                              EDM tool for graphical rendering and analysis of
 [2] D. Calvanese and et al. Containment of conjunctive                 student interaction data. G-EDM, 2014.
     regular path queries with inverse. KRR, pages                 [21] J. Stamper, M. Eagle, T. Barnes, and M. Croy.
     176–185, 2015.                                                     Experimental evaluation of automatic hint generation
 [3] R. Dekel and K. Gal. On-line plan recognition in                   for a logic tutor. Artificial Intelligence in Education,
     exploratory learning environments. G-EDM, 2014.                    pages 345–352, 2011.
 [4] M. Eagle and T. Barnes. Exploring differences in              [22] F. Suchanek, G. Kasneci, and G. Weikum. YAGO: a
     problem solving with data-driven approach maps.                    core of semantic knowledge. WWW, 2007.
     EDM, 2014.                                                    [23] D. Suthers. From contingencies to network-level
 [5] M. Eagle, D. Hicks, B. Peddycord III, and T. Barnes.               phenomena: Multilevel analysis of activity and actors
     Exploring networks of problem-solving interactions.                in heterogeneous networked learning environments.
     LAK, pages 21–30, 2015.                                            LAK, 2015.
 [6] M. Eagle, M. Johnson, and T. Barnes. Interaction
   Exploring the function of discussion forums in MOOCs:
    comparing data mining and graph-based approaches
          Lorenzo Vigentini                                                                            Andrew Clayphan
      Learning & Teaching Unit                                                                    Learning & Teaching Unit
           UNSW Australia,                                                                             UNSW Australia,
   Lev 4 Mathews, Kensington 2065                                                              Lev 4 Mathews, Kensington 2065
          +61 (2) 9385 6226                                                                           +61 (2) 9385 6226
     l.vigentini@unsw.edu.au                                                                     a.clayphan@unsw.edu.au


ABSTRACT                                                               learning environments [6]. These can be deployed in a variety of
In this paper we present an analysis (in progress) of a dataset        ways, ranging from a tangential support resource which students
containing forum exchanges from three different MOOCs. The             can refer to when they need help, to a space for learning with
forum data is enhanced because together with the exchanges and         others, driven by the activities students have to carry out (usually
the full text, we have a description of the design and pedagogical     sharing work and eliciting feedback). The latter, in a sense,
function of forums in these courses and a certain level of detail      emulates class-time in traditional courses providing a space for
about the users, which includes achievement, completion, and in        structured discussions about the topics of the course. One could
some instances more details such as: education; employment; age;       argue that like in face-to-face classes, the value of the interaction
and prior MOOC exposure.                                               depends on the importance attributed to the forums by the
                                                                       instructors. This is an interesting point to explore teachers’
Although a direct comparison between the datasets is not possible      presence and the value of their input in directing such
because the nature of the participants and the courses are             conversations. Mazzolini & Maddison characterize the role of the
different, what we hope to identify using graph-based techniques       teacher and teacher presence in online discussion forums as
is a characterization of the patterns in the nature and development    varying from being the ‘sage on the stage’, to the ‘guide on the
of communication between students and the impact of the ‘teacher       side’ or even ‘the ghost in the wings’ [7]. Furthermore they argue
presence’ in the forums. With the awareness of the differences, we     that the ‘ideal’ degree of visibility of the instructor in discussion
hope to demonstrate that student engagement can be directed ‘by-       forums depends on the purpose of forums and their relationship to
design’ in MOOCs: teacher presence should therefore be planned         assessment. There are also a number of accounts indicating that
carefully in the design of large-scale courses.                        students’ learning in forums is not very effective [8, 9]. However
                                                                       if one looks at the data there are numerous examples indicating
Keywords                                                               that behaviours in forums are good predictors of performance in
MOOCs, Discussion forums, graph-based EDM, pedagogy.                   the courses using them, particularly if forum activities are
                                                                       assessed [10,11,12,13]. Yet, forums in MOOCs tend to attract
                                                                       only a small portion of the student activity [14]. This is setting
1. INTRODUCTION                                                        forums in MOOCs apart from ‘tutorial-type’ forums used to
In the past couple of years MOOCs (Massive Open Online                 support students’ learning in online or blended courses in higher
Courses) have become the center of much media hype as                  education. Furthermore, some argue that active engagement is not
disruptive and transformational [1, 2]. Although the focus has         the only way of benefiting from discussion forums [15] and
been on a few characteristics of the MOOCS – i.e. free courses,        students’ characteristics and preferences could be more important
massive numbers, massive dropouts and implicit quality                 than the course design in determining the way in which they take
warranted by the status of the institutions delivering these courses   full advantage of online resources [16].
– a rapidly growing research interest has started to question the
effectiveness of MOOCS for learning and their pedagogies. If one
ignores entirely the philosophies of teaching driving the design
                                                                       2. THE THREE MOOCS IN DETAIL
and delivery of MOOCs going from the the socio-constructivist          In order to investigate the way in which students use the
(cMOOC, [4, 5]) to instructivist (xMOOC, [3]), at the practical        discussion forums, we have extracted data from three MOOCS
level, instructors have to make specific choices about how to use      delivered by a large, research intensive Australian university. The
the tools available to them. One of these tools is the discussion      three courses are: P2P (From Particles to Planets - Physics);
forum. Forums are one of the most popular asynchronous tools to        LTTO (Learning to Teach Online); and INTSE (Introduction to
support students’ communication and collaboration in web-based         Systems Engineering), which are broadly characterised in the top
                                                                       of Table 1. The courses were specifically designed in quite
                                                                       different ways to test hypotheses about their design, delivery and
                                                                       effectiveness.
                                                                       In particular, P2P was designed emulating a traditional university
                                                                       course in a sequential manner. All content was released on a
                                                                       week-by-week basis dictating the pace of instruction. LTTO and
                                                                       INTSE, instead were designed to provide a certain level of
                                                                       flexibility for the students to elect their learning paths. All content
was readily available at the start, however for LTTO, the delivery      in forum is 4x in magnitude compared to the other courses. Yet, if
followed a week-on-week delivery focusing on the interaction            we look at the average amount of posts or comments, the patterns
with students and a selective attention to particular weekly topics     are not straightforward to interpret, as the level of engagement is
(i.e. weekly feedback videos driven by the discussion forums as         similar across the courses with 3 to 5 posts per student and 1 to 3
well as weekly announcements). Although announcements were              comments (i.e. replies to existing posts), but with P2P showing a
used also in INTSE, the lack of weekly activities in the forums did     higher level of engagement than the other courses. One possible
not impose a strong pacing. In INTSE, the forums had only a             explanation is the different target group of the different courses
tangential support value and were used mainly to respond to             with INTSE including a majority of professional engineers with
students’ queries and to clarify specific topics emerging from the      postgraduate qualifications, P2P focusing on high school student
quizzes. Table 1 provides an overview of the different courses.         and teachers, and LTTO targeting a broad base of teachers across
This also shows that the forum activity in the various courses is a     different educational levels.
very small portion of all actions emerging from the logs of activity
which has been reported in the literature [9].                                                INTSE             LTTO               P2P
                                                                                                             Teachers at all   High school
                                                                        Target group         Engineers
3. DETAILS OF THE DATASET                                                                                        levels        and teachers
3.1 The dataset                                                         Course length         9 weeks           8 weeks          8 weeks
The data under consideration is an export form the Coursera                                     54                105               63
                                                                        Forums
platform. Raw forum database tables (posts, comments, tags,                                 (14 top level)    (17 top-level)   (15 top-level)
votes) as well as a JSON based web clickstream were used. The           Design mode         All-at-once       All-at-once      Sequential
clickstream events consist of a key which specifies action – either
a ‘pageview’ or ‘video’ item. Forum clickstream events were             Delivery mode       All-at-once        Staggered        Staggered
identified by a common ‘/forum’ prefix.                                 Use of forums       Tangential       Core activity      Support
The clickstream was further classified into: browsing; profile          N in forum          422 (2.1%)       1685 (9.3%)       293 (2.8%)
lookups; social interaction (looking at contributions); search;
tagging; and threads. From the classification it became evident the                            1361             6361              1399
                                                                        Tot posts
clickstream did not record all events, such as when a post or                                (avg=3.3)        (avg=3.8)         (avg=4.8)
comment was made, or when votes were applied. For these,                                        285             2728               901
                                                                        Tot comments
specific database tables were used. In order to manage different                             (avg=0.7)        (avg=1.7)         (avg=3.1)
data sets and sources, a standardized schema was built, allowing        Registrants            32705             28558            22466
disparate sources to feed into, but exposing a common interface to
conduct analysis over forum activities. This is shown in Figure 1.      Active
                                                                                                60%               63%              47%
                                                                        students1
          Figure 1. Forum data transformation process                                          4.2%              4.4%             0.7%
                                                                        Completing2
                                                                                              (0.3% D)           (2.4 D)          (0.2%)
                                                                        Table 1. Summary of the courses under investigation. NOTE:
                                                                         1. Active students are those appearing in the clickstream; 2.
                                                                        Completing students achieve the pass grade or Distinction (D)
                                                                        The type of activity is summarised in Figure 2. In the chart, the
                                                                        five categories refer to the following: View corresponds to listing
                                                                        forums, threads and viewing posts; Post is the writing of a post or
                                                                        start of a new thread; Comment is a reply to an existing post;
                                                                        Social refers to all actions engaging directly with other’s status
                                                                        (up-vote, down-vote and looking at profiles/reputation); Engage
                                                                        refers to the additional interaction with forums content (searching,
                                                                        tagging, ‘watching’ or subscribing to posts or threads).
                                                                        The viewing behaviour is the most prominent for both the student
                                                                        and instructor groups and the figures are pretty much similar
                                                                        across the board. A two-way ANOVA (2x5, role by activity) on
                                                                        the percentage of distributions, shows that there is no significant
                                                                        difference between students and instructors, but there is an
                                                                        obvious difference between views and the other types of
                                                                        behaviour (F(4,29) = 1656.3, p < .01).
                                                                        If we consider the engagement over the timeline and compare the
                                                                        type of activities carried out by students and instructors, Figure 3
3.2 An overview of forums activity                                      (end of the paper) shows the patterns for the three courses. The
There are very interesting trends which require more detailed           most striking pattern is that there doesn’t seem to be an obvious
examination (bottom of table 1). As expected, in LTTO the forum         one. For what concerns posts and views in all the three courses
activity is larger than in the other courses and this is probably due   there is a sense of synchronicity between the two groups, however
to the fact that students were asked to submit post in forums           from this chart it is not possible to understand in more detail what
following the learning activities. The proportion of active students    are the connections between what students and teachers do.
Instructors’ comments are slightly offset, possibly as a reaction to   activity of students in the forums is organized according to a set of
students’ posts. An interesting aspect is the amount of ‘social’       commonly used quantitative metrics and a couple of measures
engagement in the P2P course that merits further analysis.             borrowed from Social Network Analysis (table 2). Although this
                                                                       seems to be a promising approach, there are two issues with this
4. DIRECTIONS AND OPEN QUESTIONS                                       methodology in the MOOCs: 1) only a tiny proportion of students
From this coarse analysis it is apparent that there seem to be         can be considered active and 2) it is hard to scale the instructor’s
minimal behavioural differences in the way students and                evaluation. The first problem is not easily resolved and it is an
instructors interact in the different courses, however more analysis   issue in the literature reviewed [17, 18]; non-posting behavior is
is required to tackle questions about the individual differences in    considered as an index of disengagement, partly because this is
students’ and instructors’ patterns of interaction and their           easy to measure. In principle the latter could be substituted by
interrelations. Furthermore little can be said about how the nature    peer evaluation (up-vote, down-vote), but there is no easy way to
of interactions drives the development of communication and            ensure consistency.
engagement. However a number of questions like the following
                                                                          Indicator           Type                   Description
remain open and unanswered: how do discussions develop over
time? How teacher presence affects the development of                  Messages           Quantitative     Number of messages written by
discussions? Is the number of forums affecting how students                                                the student.
engage with them (i.e. causing disorientation)?                        Threads            Quantitative     Number of new threads created
                                                                                                           by the student.
       Figure 2. Distribution of forum activities by role
                                                                       Words              Quantitative     Number of words written by
                                                                                                           the student.
                                                                       Sentences          Quantitative     Number of sentences written
                                                                                                           by the student.
                                                                       Reads              Quantitative     Number of messages read on
                                                                                                           the forum by the student.
                                                                       Time               Quantitative     Total time, in minutes, spent
                                                                                                           on forum by the student.
                                                                       AvgScoreMsg        Qualitative      Average score on the
                                                                                                           instructor's evaluation of the
                                                                                                           student's messages.
                                                                       Centrality         Social           Degree centrality of the
                                                                                                           student.
                                                                       Prestige           Social           Degree prestige of the student.
                                                                       Table 2. Possible indicators characterising forum engagement


                                                                       An alternative method that can be explored is graph-based
                                                                       approaches. For example, Bhattacharya et al. [19] used graph-
                                                                       based techniques to explore the evolution of software and source
                                                                       branching providing an insight in the process. Kruck et al. [20]
                                                                       developed GSLAP, an interactive, graph‐based tool for analyzing
                                                                       web site traffic based on user‐defined criteria.
                                                                       Kobayashi et al. [18] used a method to quickly identify and track
                                                                       the evolution of topics in large datasets using a mix of assignment
                                                                       of documents to time slices and clustering to identify discussion
                                                                       topics. Yang et al [21] integrated graph-based clustering to
                                                                       characterize the emergence of communities and text-based
                                                                       analysis to portray the nature of exchanges. In fact, students move
                                                                       in the various sub-forums taking different roles or stances as they
                                                                       engage with different subsets of students. As the reasons to
                                                                       engage in these discussions are partly determined by different
                                                                       interests, goals, and issues, it is possible to construct a social
                                                                       network graph based on the post-reply-comment structure within
                                                                       threads. The network generated provides a possible view of a
4.1 The DM and graph-based approaches                                  student’s social participation within a MOOC, which may indicate
A possible way to answer the questions about the types/patterns of     some detail about their values, beliefs and intentions.
behaviours, the structure and development of networks and the
growth of groups/communities over time might be using data             Furthermore, Brown et al [22] have already shown the value of
mining and graph-based approaches. For example, [6] used a             exploring the communities in discussion forums in MOOCs
combination of quantitative, qualitative and social network            particularly for what concerns the homogeneity of performance
information about forum usage to predict students' success or          but dissimilarity of motivations characterizing student hubs.
failure in a course by applying classification algorithms and
classification via clustering algorithms. In their approach the
4.2 Discussion points                                                  [6] Cristóbal Romero, Manuel-Ignacio López, Jose-María Luna,
The examples above provide evidence of the potential for using             and Sebastián Ventura. 2013. Predicting students’ final
graph-based methods to obtain better insights into the process and         performance from participation in on-line discussion forums.
content analysis for our dataset and to extend its applicability to        Computers & Education 68 (October 2013), 458–472. DOI:
MOOCs, however there are a number of contentious points to                 http://dx.doi.org/10.1016/j.compedu.2013.06.009
raise which will provide opportunities for discussion.                 [7] Margaret Mazzolini and Sarah Maddison. 2007. When to
Firstly the number of students who are actively involved in                jump in: The role of the instructor in online discussion
discussion is a very small proportion of the active participants.          forums. Computers & Education 49, 2 (September 2007),
This means that the subset may not be representative at all. One           193–213.
could argue that these students are already engaged or desperately         DOI:http://dx.doi.org/10.1016/j.compedu.2005.06.011
need help. Previous literature [21, 22, 23] focused on the ability     [8] M.j.w. Thomas. 2002. Learning within incoherent structures:
to predict performance and on the peer effect which can emerge             the space of online discussion forums. Journal of Computer
from the analysis of the graphs/social networks.                           Assisted Learning 18, 3 (September 2002), 351–366.
Secondly, one could question the value of the communities in               DOI:http://dx.doi.org/10.1046/j.0266-4909.2002.03800.x
xMOOCs: especially when courses are designed with an                   [9] Daniel F.O. Onah, Jane Sinclair, and Russell Boyatt. 2014.
instructivits approach leading to mastery, by definition this is an        Exploring the use of MOOC discussion forums. In
individualistic perspective focused on the testing of one’s own            Proceedings of London International Conference on
skills/learning. Of course in cMOOCs -connectivists by design-             Education. London: LICE, 1–4.
the importance of the development of social support is essential.      [10] Alstete, J.W. and Beutell, N.J. Performance indicators in
This seems to be supported by Brown et al [22]: they were not               online distance learning courses: a study of management
able to uncover a direct relation between stated goals and                  education. Quality Assurance in Education 12, 1 (2004), 6–
motivations with the participation in forums, and attributed this to        14.
pragmatic needs. However, as the authors suggested earlier, the
instructors might play a fundamental role in shaping the               [11] Cheng, C.K., Paré, D.E., Collimore, L.-M., and Joordens, S.
communities based on the value attributed to forums in their                Assessing the effectiveness of a voluntary online discussion
plans/design and the level of engagement/interaction. Considering           forum on improving students’ course performance.
the split between cMOOCs and xMOOCs again, interesting work                 Computers & Education 56, 1 (2011), 253–261.
might come out of the experiment conducted by Rose’ and                [12] Palmer, S., Holt, D., and Bray, S. Does the discussion help?
colleagues in the DALMOOC in which automated agents were                    The impact of a formally assessed online discussion on final
deployed to support students’ conversations. In Coursera the                student results. British Journal of Educational Technology
deployment of ‘community mentors’ will be an interesting space              39, 5 (2008), 847–858.
to explore, given that the importance of design seems to be
                                                                       [13] Patel, J. and Aghayere, A. Students’ Perspective on the
removed from instructors in the ‘on-demand’ model.
                                                                            Impact of a Web-based Discussion Forum on Student
Lastly, more research is needed in the time-based dimension of              Learning. Frontiers in Education Conference, 36th Annual,
development of forums in MOOCs. Questions like how students                 (2006), 26–31.
bond and create stable relations, how they become authoritative
                                                                       [14] Jacqueline Aundree Baxter and Jo Haycock. 2014. Roles and
and what motivates them to contribute over time are all open
                                                                            student identities in online large course forums: Implications
questions which the analysis of graphs over time might be able to
                                                                            for practice. The International Review of Research in Open
address.
                                                                            and Distributed Learning 15, 1 (January 2014).

5. REFERENCES                                                          [15] Vanessa Paz Dennen. 2008. Pedagogical lurking: Student
[1] Dirk Jan van den Berg and Edward Crawley. Why MOOCS                     engagement in non-posting discussion behavior. Computers
    Are Transforming the Face of Higher Education. Retrieved                in Human Behavior 24, 4 (July 2008), 1624–1633.
    April 12, 2015 from http://www.huffingtonpost.co.uk/dirk-               DOI:http://dx.doi.org/10.1016/j.chb.2007.06.003
    jan-van-den-berg/why-moocs-are-                                    [16] René F. Kizilcec, Chris Piech, and Emily Schneider. 2013.
    transforming_b_4116819.html                                             Deconstructing Disengagement: Analyzing Learner
[2] Chris Parr. The evolution of Moocs. Retrieved April 12,                 Subpopulations in Massive Open Online Courses. In
    2015 from                                                               Proceedings of the Third International Conference on
    http://www.timeshighereducation.co.uk/comment/opinion/th                Learning Analytics and Knowledge. LAK ’13. New York,
    e-evolution-of-moocs/2015614.article                                    NY, USA: ACM, 170–179.
                                                                            DOI:http://dx.doi.org/10.1145/2460296.2460330
[3] C. Osvaldo Rodriguez. 2012. MOOCs and the AI-Stanford
    Like Courses: Two Successful and Distinct Course Formats           [17] Dennen, V.P. Pedagogical lurking: Student engagement in
    for Massive Open Online Courses. European Journal of                    non-posting discussion behavior. Computers in Human
    Open, Distance and E-Learning (January 2012).                           Behavior 24, 4 (2008), 1624–1633.

[4] George Siemens. 2005. Connectivism: A learning theory for          [18] Kobayashi, M. and Yung, R. Tracking Topic Evolution in
    the digital age. International journal of instructional                 On-Line Postings: 2006 IBM Innovation Jam Data. In T.
    technology and distance learning 2, 1 (2005), 3–10.                     Washio, E. Suzuki, K.M. Ting and A. Inokuchi, eds.,
                                                                            Advances in Knowledge Discovery and Data Mining.
[5] Stephen Downes. 2008. Places to go: Connectivism &                      Springer Berlin Heidelberg, 2008, 616–625.
    connective knowledge, Innovate.
[19] Bhattacharya, P., Iliofotou, M., Neamtiu, I., and Faloutsos,    [22] R. Brown, C Lynch, Y. Wang, M. Eagle, J. Albert, T.
     M. Graph-based Analysis and Prediction for Software                  Barnes, R. Baker, Y. Bergner, D. McNamara. Communities
     Evolution. Proceedings of the 34th International Conference          of performance and communities of preference. GEDM
     on Software Engineering, IEEE Press (2012), 419–429.                 2015, in press.
[20] Kruck, S.E., Teer, F., and Jr, W.A.C. GSLAP: a graph-based      [23] M. Fire, G. Katz, Y. Elovici, B. Shapira, and L. Rokach,
     web analysis tool. Industrial Management & Data Systems              “Predicting Student Exam’s Scores by Analyzing Social
     108, 2 (2008), 162–172.                                              Network Data,” in Active Media Technology, R. Huang, A.
                                                                          A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin,
[21] D. Yang, M. Wen, A. Kumar, E. P. Xing, and C. P. Rose,
                                                                          Eds. Springer Berlin Heidelberg, 2012, pp. 584–595.
     “Towards an integration of text and graph clustering methods
     as a lens for studying social interaction in MOOCs,” The
     International Review of Research in Open and Distributed
     Learning, vol. 15, no. 5, Oct. 2014.

     Figure 3. Time sequence of activity in the forums in the three courses by students and instructors grouped by activity type
BOTS: Selecting Next-Steps from Player Traces in a Puzzle
                         Game

                   Drew Hicks                           Yihuan Dong                            Rui Zhi
               North Carolina State                  North Carolina State                North Carolina State
                    University                            University                          University
                 911 Oval Drive                        911 Oval Drive                      911 Oval Drive
               Raleigh, NC 27606                     Raleigh, NC 27606                   Raleigh, NC 27606
             aghicks3@ncsu.edu       ydong2@ncsu.edu           rzhi@ncsu.edu
                          Veronica Cateté         Tiffany Barnes
                                  North Carolina State                 North Carolina State
                                       University                           University
                                    911 Oval Drive                       911 Oval Drive
                                  Raleigh, NC 27606                    Raleigh, NC 27606
                                vmcatete@ncsu.edu                  tmbarnes@ncsu.edu

ABSTRACT                                                          model student behavior in tutors, and provide insight into
In the field of Intelligent Tutoring Systems, data-driven meth-   problem-solving strategies and misconceptions. This data
ods for providing hints and feedback are becoming increas-        structure can be used to provide hints, by treating the Inter-
ingly popular. One such method, Hint Factory, builds an           action Network similarly to a Markov Decision Process and
interaction network out of observed player traces. This data      selecting the most appropriate next step from the requesting
structure is used to select the most appropriate next step        user’s current state. This method has successfully been em-
from any previously observed state, which can then be used        ployed in systems in which each action a player may take is
to provide guidance to future players. However, this method       of similar cost; for example in the Deep Thought logic tutor
has previously been employed in systems in which each ac-         each action is an application of a particular axiom. Applying
tion a player may take requires roughly similar effort; that      this method to an environment where actions are of differ-
is, the “step cost” is constant no matter what action is taken.   ent costs, or outcomes are of varying value will require some
We hope to apply similar methods to an interaction network        adaptations to be made. In this work, we discuss how we
built from player traces in our game, BOTS; However, each         will apply Hint Factory methods to an interaction network
edge can represent a varied amount of effort on the part of       built from player traces in a puzzle game, BOTS. In BOTS,
the student. Therefore, a different hint selection policy may     each “Action” is the set of changes made to the program
be needed. In this paper, we discuss the problems with our        between each run. Therefore, using the current hint selec-
current hint policy, assuming all edges are the same cost.        tion policy would result in very high-level hints comprising
Then, we discuss potential alternative hint selection policies    a great number of changes to the student’s program. Since
we have considered.                                               this is undesirable, a different hint selection policy may be
                                                                  needed.
Keywords
Hint Generation, Serious Games, Data Mining                       2.    DATA-DRIVEN HINTS AND FEEDBACK
                                                                  In the ITS community, several methods have been proposed
1.   INTRODUCTION                                                 for generating hints/feedback from previous observations of
Data-driven methods for providing hints and feedback are          users’ solutions or behavior. Rivers et al propose a data-
becoming increasingly popular, and are especially useful for      driven method to generate hints automatically for novice
environments with user- or procedurally-generated content.        programmers based on Hint Factory[8]. They present a
One such method, Hint Factory, builds an interaction net-         domain-independent algorithm, which automates hint gener-
work out of observed player traces. An Interaction Network        ation. Their method relies on solution space, which utilizes
is a complex network of student-tutor interactions, used to       graph to represent the solution states. In solution space,
                                                                  each node represents a candidate solution and each edge rep-
                                                                  resents the action used to transfer from one state to another.
                                                                  Due to the existence of multiple ways to solve a program-
                                                                  ming problem, the size of the solution space is huge and thus
                                                                  it is impractical to use. A Canonicalizing model is used to
                                                                  reduce the size of the solution space. All states are trans-
                                                                  formed to canonicalized abstract syntax trees (ASTs). If the
                                                                  canonical form of two different states are identical, they can
                                                                  be combined together. After simplifying the solution space,
                                                                  hint generation is implemented. If the current state is in-
correct and not in the solution space, the path construction
algorithm will find an optimal goal state in the solution space
which is closest to current state. This algorithm uses change
vectors to denote the change between current state and goal
state. Once a better goal state is found during enumerat-
ing all possible changes, it returns the current combination
of change vectors. Each change vector can be applied to
current state and then form an intermediate state. The in-
termediate states are measured by desirability score, which
represents the value of the state. And then the path con-
struction algorithm generates optimal next states based on
the rank of the desirability scores of all the intermediate
states. Thus a new path can be formed and added to the
solution space, and appropriate hints can be generated.

Jin et al propose linkage graph to generate hints for pro-
gramming courses[4]. Linkage graph uses nodes to represent
program statements and direct edges to indicate ordered de-
                                                                  Figure 1: The BOTS interface. The robot’s program
pendencies of those statements. Jin’s approach applies ma-
                                                                  is along the left side of the screen. The “toolbox” of
trix to store linkage graph for computation. To generate
                                                                  available commands is along the top of the screen.
linkage matrix, first, they normalize variables in programs
by using instructor-provided variable specification file. After
variable normalization, they sort the statement with 3 steps:
(i) preprocessing, which breaks a single declaration for mul-     information and check if hints are available for the state.
tiple variables (e.g. int a, b, c) into multiple declaration      The action that leads to subsequent state with the highest
statements (e.g. int a; int b; int c;); (ii) creating statement   value is used to generate a hint sequence. A hint sequence
sets according to variable dependencies, which put indepen-       consists of four types of hints and are ordered from gen-
dent statements into first set, put statements depend only        eral hint to detailed hint. Hint provider will then show hint
on statements in the first set into second set, put statements    from top of the sequence to the student. Hint Factory has
depends only on statements in the first and second set into       been applied in logic tutors which helps students learn logic
third set, and so on; (iii) in-set statement sorting, during      proof. The result shows that the hint-generating function
which the statements are sorted in decreasing order within        could provide hints over 80% of the time.
set using their variable signatures. In hint generation, they
first generate linkage graphs with a set of correct solutions,    3.   BOTS
as the sources for hint generation. They also compose the
                                                                  BOTS is a programming puzzle game designed to teach
intermediate steps during program development into a large
                                                                  fundamental ideas of programming and problem-solving to
linkage graph, and assign a reward value to each state and
                                                                  novice computer users. The goal of the BOTS project is
the correct solution. Then, they apply value iteration to
                                                                  to investigate how to best use community-authored content
create a Markov Decision Process (MDP). When a student
                                                                  within serious games and educational games. BOTS was
requires hint, tutor will generate a linkage graph for the par-
                                                                  inspired by games like LightBot [10] and RoboRally [2], as
tial program and try to find the closest match in MDP. If
                                                                  well as the success of Scratch and it’s online community [1]
a match is found in MDP, the tutor would generate hint
                                                                  [5]. In BOTS, players take on the role of programmers writ-
with the next best state based on highest assigned value.
                                                                  ing code to navigate a simple robot around a grid-based 3D
If a match is not found in current MDP, which means the
                                                                  environment, as seen in Figure 1. The goal of each puzzle
student is taking a different approach from existing correct
                                                                  is to press several switches within the environment, which
solutions, the tutor will try to modify those correct solutions
                                                                  can be done by placing an object or the robot on them.
to fit student’s program and then provide hints.
                                                                  To program the robots, players will use simple graphical
                                                                  pseudo-code, allowing them to move the robot, repeat sets
Hint Factory[9] is an automatic hint generation technique
                                                                  of commands using “for” or “while” loops, and re-use chunks
which uses Markov decision processes (MDPs) to generate
                                                                  of code using functions. Within each puzzle, players’ scores
contextualized hints from past student data. It mainly con-
                                                                  depend on the number of commands used, with lower scores
sists of two parts - Markov Decision Process (MDP) gener-
                                                                  being preferable. In addition, each puzzle limits the maxi-
ator and hint provider. The MDP generator runs a process
                                                                  mum number of commands, as well as the number of times
to generate MDP values for all states seen in previous stu-
                                                                  each command can be used. For example, in the tutorial
dents’ solutions. In this process, all the students’ solutions
                                                                  levels, a user may only use the “Move Forward” instruction
are combined together to form a single graph. Each node
                                                                  10 times. Therefore, if a player wants to make the robot
of the graph represents a state, and each edge represents an
                                                                  walk down a long hallway, it will be more efficient to use a
action one student takes to transform from current state to
                                                                  loop to repeat a single “Move Forward” instruction, rather
another state. Once the graph is built, the MDP generator
                                                                  than to simply use several “Move Forward” instructions one
uses Bellman backup to assign values for all nodes. After up-
                                                                  after the other. These constraints are meant to encourage
dating all values, a hint file is generated. The hint provider
                                                                  players to re-use code and optimize their solutions.
uses hint file to provide hint. When a student asks for a hint
at a existing state, hint provider will retrieve current state
                                                                  In addition to the guided tutorial mode, BOTS also con-
tains an extensive “Free Play” mode, with a wide selection
of puzzles created by other players. The game, in line with                      a                               b
the “Flow of Inspiration” principles outlined by Alexander
Repenning [7], provides multiple ways for players to share
knowledge through authoring and modifying content. Play-
ers are able to create their own puzzles to share with their
peers, and can play and evaluate friends’ puzzles, improv-
ing on past solutions. Features such as peer-authored hints                      c                               d
for difficult puzzles, and a collaborative filtering approach
to rating are planned next steps for the game’s online ele-
ment. We hope to create an environment where players can                                  Error
continually challenge their peers to find consistently better
solutions for increasingly difficult problems.                                   e                               f
User-generated content supports replayability and a sense
of a community for a serious game. We believe that user-
created puzzles could improve interest, encouraging students
to return to the game to solidify their mastery of old skills
and potentially helping them pick up new ones.                     Figure 2: Several generated hints for a simple puz-
                                                                   zle. The blue icon represents the robot. The ’X’ icon
4. ANALYSIS                                                        represents a goal. Shaded boxes are boxes placed on
                                                                   goals, while unshaded boxes are not on goals. Hint
4.1 Dataset                                                        F in this figure depicts the start and most common
Data for the BOTS studies has come from a middle school            goal state of the puzzle.
computer science enrichment program called SPARCS. In
this program, the students attend class on Saturday for 4
hours where computer science undergraduates teach them             (start/intermediate) → (intermediate/goal) These transitions
about computational thinking and programming. Students             occur when a student moves the robot to an intermediate
attend a total of 7 sessions, each on a different topic, ranging   state rather than directly to the goal. Since players run
from security and encryption to game design. The students          their programs to produce output, we speculate that these
all attend the same magnet middle school. The demograph-           may represent subgoals such asmoving a box onto a specific
ics for this club are 74.2% male, 25.8% female, 36.7% African      switch. After accomplishing that task, the user then ap-
American, and 23.3% Hispanic. The student’s grade distri-          pends to their program, moving towards a new objective,
bution is 58% 6th grade, 36% 7th grade and 6% 8th grade.           until they reach a goal state. Hint B in Figure 2 shows a
                                                                   hint generated from such a transition.
From these sessions, we collected gameplay data for 20 tu-
torial puzzles as well as 13 user-created puzzles, With this
data, we created an Interaction Network in order to be able        4.2.2    Correction Transition
to provide hints and feedback for future students [3]. How-        (error/mistake) → (intermediate/goal) This transition oc-
ever, using program edits as states, the interaction networks      curs when a student makes and then corrects a mistake.
produced were very sparse. In order to be better able to           These are especially useful because we can offer hints based
relate similar actions, we produced another interaction net-       on the type of mistake. Hints D and E in Figure 2 show hints
work using program output as our state definition [6].             built from this type of transition; however, hint E shows a
                                                                   case where a student resolved the mistake in a suboptimal
                                                                   way.
4.2     States and Transitions
Based on the data collected, we can divide the set of ob-
served states into classes. First among these is the start state   4.2.3    Simple Solution Transition
in which the problem begins. By definition, every player’s         (start) → (goal) This occurs when a student enters an entire,
path must begin at this state. Next is the set of goal states      correct program, and solves the puzzle in one attempt. This
in which all buttons on the stage are pressed. These are           makes such transitions not particularly useful for generating
reported by the game as correct solutions. Any complete            hints, other than showing a potential solution state of the
solution, by definition, ends at one such state. Among states      puzzle. Hint F in Figure 2 shows this type of transition.
which are neither start nor goal states, there are three im-
portant classifications: Intermediate states (states a robot
moves through during a correct solution), mistake states
                                                                   4.2.4    Rethinking Transition
(states a robot does not move through during a correct so-         (intermediate) → (intermediate/goal) This transition occurs
lution), and error states (states which result from illegal        when rather than appending to the program as in a subgoal
output, like attempting to move the robot out-of-bounds).          transition, the user deletes part or all of their program, then
Based on these types of states, we classified our hints based      moving towards a new goal. As a result, the first state is
on the transitions they represented.                               unrelated to the next state the player reaches. Offering this
                                                                   state as a hint would likely not help guide a different user.
                                                                   Hint A in Figure 2 shows an example of this. Finding and
4.2.1    Subgoal Transition                                        recognizing these is an important direction for future work.
 4.2.5    Error Transition
(start/intermediate) → (mistake/error) This corresponds to
a program which walks the robot out of bounds, into an
object, or other similar errors. While we disregarded these
as hints, this type of transition may still be useful. In such
a case, the last legal output before the error could be a
valuable state. Hint C in Figure 2 is one such case.

4.3      Next Steps
While this approach was able to help us identify interest-
                                                                       Figure 3: This subgraph shows a short-circuit where
ing transitions, as well as significantly reduce the sparseness
                                                                       a player bypasses a chain of several steps.
of the Interaction Network by merging states with similar
output, we violate several assumptions of the Hint Factory
technique by using user compilation as an action. Essen-               Another problem which arises from this state representation
tially, the cost of an action can vary widely. In the most             is seen in Hints C and E above. These hints show states
extreme examples, the best next state selected by Hint Fac-            where a student traveled from a state to a worse state before
tory will simply be the goal state.                                    ultimately solving the problem. Since we limit our search for
                                                                       hintable states to the immediate child states of s in s0 , we are
4.4      Current Hint Policy                                           unable to escape from such a situation if the path containing
Our current hint selection policy is the same as the one used          the error is the best or only observable path to the goal.
in the logic tutor Deep Thought with a few exceptions [9].
We combine all student solution paths into a single graph,
mapping identical states to one another (comparing either
                                                                       4.5    Proposed Hint Policies
                                                                       One potential modification of the hint policy involves ana-
the programs or the output). Then, we calculate a fitness
                                                                       lyzing the programs/output on the nodes, using some dis-
value for each node. We assign a large positive value (100)
                                                                       tance metric δ(s, s0 ). This measurement would be used in
to each goal state, a low value for dead-end states (0) and
                                                                       addition to the state’s independent fitness value R(s) which
a step cost for each step taken (1). Setting a non-zero cost
                                                                       takes into account distance from a goal, but is irrespective
on actions biases us towards shorter solutions. We then
                                                                       of the distance from any previous state. For example in the
calculate fitness values V (s) for each state s, where R(s) is
                                                                       short-circuit example above, using “difference in number of
the initial fitness value for the state, γ is a discount factor,
                                                                       lines of code” as a distance metric we could take into ac-
and P (s, s0 ) is the observed frequency with which users in
                                                                       count how far the “Goal” state is from the “Start” state, and
state s go to state s0 next, via taking the action a. The
                                                                       potentially choose a nearer state as a hint. This also helps
equation for determining the fitness value of a state is as
                                                                       correct for small error-correction steps in player solutions;
follows:
                                                                       if the change between the current state and the target hint
                                                                       state is very small, we may want to consider hinting toward
                                                                       the next step instead, or a different solution path altogether.
                                   X
           V (s) := R(s) + γmax             Pa (s, s0 )V (s0 )   (1)
                               a
                                       s0
                                                                                                       X
                                                                              V (s) := R(s) + γmax          δ(s, s0 )P (s, s0 )V (s0 )   (3)
                                                                                                   a
However, in our current representation there is only one                                               s0
available action from any state: “run.” Different players us-
ing this action will change their programs in different ways
                                                                       One potential downside to this approach is that it requires
between runs, so it is not useful to aggregate across all the
                                                                       somewhat more knowledge of the domain to be built into the
possible resulting states. Instead, we want to consider each
                                                                       model. If the distance metric used is inaccurate or flawed,
resulting state on its own. As a result, we use a simplified
                                                                       there may be cases where we choose a very suboptimal hint.
version of the above, essentially considering each possible
                                                                       using difference in lines of code as our distance metric, the
resulting state s0 as the definite result of its own action:
                                                                       change between a state where a player is using no functions
                                                                       and a state where the user writes existing code into a func-
                                                                       tion may be very small. Hints selected in these cases might
             V (s) := R(s) + γmax
                               0
                                 P (s, s0 )V (s0 )               (2)   guide students away from desired outcomes in our game.
                                   s

                                                                       Another problem we need to resolve with our current hint
Since the action “run” can encompass many changes, select-             policy, as discussed above, is the case where the best or
ing the s0 which maximizes the value may not always be the             only path to a goal from a given state s has an error as
best choice for a hint. The difference between s and s0 can            a direct child s0 . One method of resolving this could be,
be quite large, and this is usually the case when an expert            instead of offering s0 as a hint, continuing to ask for next-
user solves the problem in one try, forming an edge directly           step hints from s0 until some s0 is a hintable, non-error state.
between the “start” state and “goal” state. These and other            This solution requires no additional knowledge of the game
“short-circuits” make it difficult to assess which of the child        domain, however it’s possible that the hint produced will
nodes would be best to offer as a hint by simply using the             be very far from s, or that we may skip over important
calculated fitness value.                                              information about how to resolve the error or misconception
that led the student into state s in the first place.             [4] W. Jin, T. Barnes, J. Stamper, M. J. Eagle, M. W.
                                                                      Johnson, and L. Lehmann. Program representation for
Other modifications to the hint selection policy may produce          automatic hint generation for a data-driven novice
better results than these. We hope to look into as many               programming tutor. In Intelligent Tutoring Systems,
possible modifications as we can, seeing which modifications          pages 304–309. Springer, 2012.
produce the most suitable hints on our current dataset be-        [5] D. J. Malan and H. H. Leitner. Scratch for budding
fore settling on an implementation for the live version of the        computer scientists. ACM SIGCSE Bulletin,
game.                                                                 39(1):223–227, 2007.
                                                                  [6] B. Peddycord III, A. Hicks, and T. Barnes.
5.   ACKNOWLEDGMENTS                                                  Generating hints for programming problems using
Thanks to the additional developers who have worked on this           intermediate output.
project or helped with our outreach activities so far, includ-    [7] A. Repenning, A. Basawapatna, and K. H. Koh.
ing Aaron Quidley, Trevor Brennan, Irena Rindos, Vincent              Making university education more like middle school
Bugica, Victoria Cooper, Dustin Culler, Shaun Pickford,               computer club: facilitating the flow of inspiration. In
Antoine Campbell, and Javier Olaya. This material is based            Proceedings of the 14th Western Canadian Conference
upon work supported by the National Science Foundation                on Computing Education, pages 9–16. ACM, 2009.
Graduate Research Fellowship under Grant No. 0900860              [8] K. Rivers and K. R. Koedinger. Automating hint
and Grant No. 1252376.                                                generation with solution space path construction. In
                                                                      Intelligent Tutoring Systems, pages 329–339. Springer,
6.   REFERENCES                                                       2014.
 [1] I. F. de Kereki. Scratch: Applications in computer           [9] J. Stamper, T. Barnes, L. Lehmann, and M. Croy.
     science 1. In Frontiers in Education Conference, 2008.           The hint factory: Automatic generation of
     FIE 2008. 38th Annual, pages T3B–7. IEEE, 2008.                  contextualized help for existing computer aided
 [2] R. Garfield. Roborally. [Board Game], 1994.                      instruction. In Proceedings of the 9th International
 [3] A. Hicks, B. Peddycord III, and T. Barnes. Building              Conference on Intelligent Tutoring Systems Young
     games to learn from their players: Generating hints in           Researchers Track, pages 71–78, 2008.
     a serious game. In Intelligent Tutoring Systems, pages      [10] D. Yaroslavski. LightBot. [Video Game], 2008.
     312–317. Springer, 2014.
     Semantic Graphs for Mathematics Word Problems based
                 on Mathematics Terminology

          Rogers Jeffrey Leo John                  Thomas S. McTavish               Rebecca J. Passonneau
            Center for Computational               Center for Digital Data,           Center for Computational
               Learning Systems                 Analytics & Adaptive Learning            Learning Systems
              Columbia University                          Pearson                      Columbia University
              New York, NY, USA                        Austin, TX, USA                  New York, NY, USA
             rl2689@columbia.edu                tom.mctavish@pearson.com becky@ccls.columbia.edu

ABSTRACT                                                          ercises, we randomly walk a graph using the cosine distance
We present a graph-based approach to discover and extend          as edge weights. We also recast the problem as a bipartite
semantic relationships found in a mathematics curriculum          graph with exercises on one side and words on the other,
to more general network structures that can illuminate re-        providing an edge when an exercise contains the math word.
lationships within the instructional material. Using words        We contrast these two different types of random walks and
representative of a secondary level mathematics curriculum        find somewhat similar results, which lends confidence to the
we identified in separate work, we constructed two similar-       analysis. The bipartite graph walks, however, are more sen-
ity networks of word problems in a mathematics textbook,          sitive to differences in word frequency. Casting measures of
and used analogous random walks over the two networks             similarity as graphs and performing random walks on them
to discover patterns. The two graph walks provide similar         affords more nuanced ways of relating objects, which can be
global views of problem similarity within and across chap-        used to build more granular domain models for analysis of
ters, but are affected differently by number of math words        prerequisites, instructional design, and adaptive learning.
in a problem and math word frequency.

1.    INTRODUCTION                                                2.   RELATED WORK
Curricula are compiled learning objects, typically presented      Random walks over graphs have been used extensively to
in sequential order and arranged hierarchically, as in a book’s   measure text similarity. Applications include similarity of
Table of Contents. Ideally, a domain model captures rela-         web pages [15] and other documents [5], citations [1], pas-
tionships between the learning objects and the knowledge          sages [14], person names in email [12] and so on. More re-
components or skills they exercise. Unfortunately, domain         cently, general methods that link graph walks with external
models are not often granular enough for optimal learning         resources like WordNet have been developed to produce a
experiences. For example, prerequisite relationships may be       single system that handles semantic similarity for words,
lacking, or the knowledge components associated with an           sentences or text [16]. Very little work compares walks over
exercise may be unknown. In such cases, assessments on            graphs of the same content, where the graphs have differ-
those learning objects will be insufficient to enable appro-      ent structure. We create two different kinds of graphs for
priate redirection unless expert (i.e. teacher) intervention      mathematics word problem and compare the results. We
is explicitly given. Domain models remain coarse because          find that the global results are very similar, which is good
using experts to enumerate and relate the knowledge com-          evidence for the general approach, and we find differences in
ponents is costly.                                                detail that suggest further investigation could lead to cus-
                                                                  tomizable methods, depending on needs.
As a means to automatically discover relationships among
learning objects and to reveal their knowledge components,        An initiative where elementary science and math tests are a
we demonstrate the use of direct similarity metrics and ran-      driver for artificial intelligence has led to work on knowledge
dom graph walks to relate exercises in a mathematics cur-         extraction from textbooks. Berant et al. [2] create a system
riculum. We first apply a standard cosine similarity measure      to perform domain-specific deep semantic analysis of a 48
between pairs of exercises, based on bag-of-word vectors con-     paragraphs from a biology textbook for question answering.
sisting of math terms that we identified in separate work         Extracted relations serve as a knowledge base against which
[7]. Then, to extract less explicit relationships between ex-     to answer questions, and answering a question is treated as
                                                                  finding a proof. A shallow approach to knowledge extraction
                                                                  from a fourth grade science curriculum is taken in [6], and
                                                                  the knowledge base is extended through dialog with users
                                                                  until a path in the knowledge network can be found that
                                                                  supports a known answer. In the math domain, Kushman et
                                                                  al. [10] generate a global representation of algebra problems
                                                                  in order to solve them by extracting relations from sentences
                                                                  and aligning them. Seo et al. [18] study text and diagrams
                                                                  together in order to understand the diagrams better through
                                                                  textual cues. We are concerned with alignment of content
 Two machines produce the same type of widget. Machine              high agreement. All the non-stop words were then labeled
 A produces W widgets, X of which are damaged. Machine B            by a trained annotator. We developed a supervised machine
 produces Y widgets, Z of which are damaged. The fraction of        learning approach to classify vocabulary into math and non-
                                      X
 damaged widgets for Machine A is W     or (simplified fraction).   math words [7] that can be applied to new mathematics cur-
                                                            Z
 The fraction of damaged widgets for Machine B is Y            or   ricula. For the text used here, there were 577 math terms.
 (simplified fraction). Write each fraction as a decimal and a
 percent. Use pencil and paper. Select a small percent that
 would allow for a small number of damaged widgets. Find
                                                                    3.3    Random Walks in Graphs
                                                                    A random walk on a graph starts at a given node and steps
 the number of widgets by which each machine exceeded the
                                                                    with random probability to a neighboring node. The same
 acceptable number of widgets.
                                                                    random decision process is employed at this and every sub-
                                                                    sequent node until a termination criterion is met. Each time
  Figure 1: Sample problem; math terms are in boldface.             a node is visited, it is counted. Open random walks require
                                                                    that the start node and end nodes differ. Traversal methods
                                                                    may employ a bias to navigate toward or away from certain
across rather than within problems, and our objective is            neighbors through edge weights or other graph attributes.
finer-grained analysis of curricula.
                                                                    In a graph, G = (V, E) with nodes V and edges E, a ran-
Other work that addresses knowledge representation from             dom walk that begins at vx and ends at vy can be denoted as
text includes ontology learning [3], which often focuses on         (vx , ..., vy ). By performing several random walks, the frac-
the acquisition of sets of facts from text [4]. There has           tion of times the node vy is visited converges to the prob-
been some work on linking lexical resources like WordNet            ability of target vy being visited given the start node vx ,
or FrameNet to formal ontologies [17, 13], which could pro-         which can be expressed as P (vy |vx ) under the conditions of
vide a foundation for reasoning over facts extracted from           the walk. In the case of a random walk length of 1, P (vy |vx )
text. We find one work that applies relation mining to e-           will simply measure the probability of vy being selected as
learning: Šimko and Bieliková [19] apply automated relation       an adjacent node to vx .
mining to extract relations to support e-course authoring in
the domain of teaching functional programming. Li et al.
[11] apply k-means clustering to a combination of problem           3.4    Cosine Similarity Graph
features and student performance features, and propose the          Math exercises are represented as bag-of-words vectors with
clusters correspond to Knowledge Components [8].                    boolean values to indicate whether a given math term is
                                                                    present. Cosine similarity quantifies the angle between the
                                                                    two vectors, and is given by the dot product of two vectors.
3. METHODS                                                                                             Pn
3.1 Data                                                                              te                  i=1 ti ei
                                                                        cos(t, e) =         = pPn             pPn              (1)
We used 1800 exercises from 17 chapters of a Grade 7 mathe-                         ktkkek         i=1 (ti )2      i=1 (ei )
                                                                                                                             2

matics curriculum. Most are word problems, as illustrated in
                                                                    Similarity values of 1 indicate that both the vectors are the
Figure 1. They can incorporate images, tables, and graphs,
                                                                    same whereas a value of zero indicates orthogonality between
but for our analysis, we use only the text. The vocabu-
                                                                    the two vectors. Pairwise cosine similarities for all 1800
lary of the resulting text consists of 3,500 distinct words.
                                                                    exercises were computed, yielding a cosine similarity matrix
We construct graphs where math exercises are the nodes, or
                                                                    Mcos . The matrix corresponds to a graph where non-zero
in a bipartite graph, math exercises are the left side nodes
                                                                    cosine similarities are edge weights between exercises.
and words are the right side nodes. Our initial focus is on
exercise similarity due to similarity of the math skills that
                                                                    In a graph walk, the probability that a node vy will be
exercises tap into, and we use mathematics terminology as
                                                                    reached in one step from a node vx is given by the prod-
an indirect proxy of skills a problem draws upon.
                                                                    uct of the degree centrality of vx and the normalized edge
                                                                    weight (vx , vy ). With each exercise as a starting node, we
3.2    Math Terminology                                             performed 100,000 random walks on the cosine-similarity
The text of the word problems includes ordinary language            graph, stepping with proportional probability to all outgo-
expressions unrelated to the mathematics curriculum, such           ing cosine similarity weights. To measure 2nd degrees of
as the nouns machines, widgets shown in problem in Fig-             separation, with each walk we made two steps.
ure 1, or the verbs produces, damaged. For our purposes,
mathematics terminology consists of words that expresses            For two math vectors considered as the sets A and B, cosine
concepts that are needed for the mathematical competence            similarity can be conceptualized in terms of the intersection
the curriculum addresses. To identify these terms, we devel-        set C = A ∪ B and set differences A \ B and B \ A. Cosine
oped annotation guidelines for human annotators who label           similarity is high when |C|  A \ B and |C|  B \ A.
words in their contexts of use, and assessed the reliability of
annotation by these guidelines. Words can be used in the            The degree of a node affects the probability of traversing
math texts sometimes in a math sense and sometimes in a             any edge from that node. The two factors that affect de-
non-math sense. Annotators were instructed to label terms           gree centrality of a start node are the document frequencies
based on the most frequent usage.                                   of its math words, and the total number of math words.
                                                                    Here, document frequency (df) is the normalized number of
Using a chance-adjusted agreement coefficient in [-1,1] [9],        exercises a word occurs in. A high df math word in a prob-
reliability among three annotators was 0.81, representing           lem increase its degree centrality because there will be more
Figure 2: Exercise-to-Exercise similarity in Chapter 6. The exercises of Chapter 6 are displayed in row-columns in a square
matrix. The rows represent source nodes and columns represent targets. Each row has been normalized across the book, even
though only Chapter 6 is shown. The axes demarcate the sections of the chapter. Mcos is the cosine similarity. Mcosrw is the
output from the random walk using cosine similarity edge weights, Mbp is the output from the random bipartite walk. Raw
values displayed between 0 and 0.005 corresponding to light and dark pixels, respectively.


problems it can share words with, resulting in non-zero co-       the distribution across the row as a probability distribution
sine values and therefore edges. The number of math words         to all other nodes for that source node.
in a problem also increases its degree centrality.
                                                                  4.    RESULTS
3.5    Bipartite exercise and word graph                          We compare the three measures of similarity between ex-
The set of exercises Ve are the left-side nodes and the math      ercises: 1) cosine similarity, 2) random walks using cosine
words Vw are the right-side nodes in the undirected bipartite     similarity as edge weights, and 3) random walks along a bi-
graph G = (Ve , Vw , E), where an edge exists between vex and     partite graph of exercises and words.
vwi if exercise x contains the math word i.

We performed open random walks on this graph to measure
                                                                  4.1   Exercise-to-Exercise Similarity
                                                                  We describe exercise-to-exercise similarity with square ma-
similarity between nodes. To measure the similarity of ex-
                                                                  trices where each exercise is represented as a row-column. A
ercises, we walk in even steps – a step to a connected word
                                                                  number of features of the measures are embedded in Figure
followed by a step back to one of the exercises that shares
                                                                  2, which shows heatmaps of color values for pairs of exercises
that word. The degrees of separation between vertices on
                                                                  in chapter 6 for each matrix. We find that within chapters
the same side of the graph (e.g. exercise-to-exercise) will be
                                                                  and especially within sections of those chapters, there is a
l/2 where l is the length of the walk. In this paper, we ex-
                                                                  high degree of similarity between exercises regardless of the
plored first and second degrees of separation so our bipartite
                                                                  measure. This demonstrates that words within sections and
graphs had a walk length of 4.
                                                                  chapters share a common vocabulary. We can see that Mcos
Table 1: Summary statistics of the similarity distributions       has more extreme values than Mcosrw ; as explained below,
                                                                  it has both more zero cosine values, and more very high
                cosine            rwcos          rwbp             values. This is most likely because Mcosrw , from doing the
   minimum      0              0              0                   walk, picks up exercises that are another degree of separa-
   maximum      6.3 ×10−2      0.50           0.11                tion away. When the row of the matrix is normalized to
   mean         5.55 ×10−4     5.55 ×10−4     5.55 ×10−4          capture the distribution of the source node, the otherwise
   median       0              2.41 ×10−4     2.06 ×10−4          high values from Mcos are tempered in the Mcosrw matrix.
   std. dev.    1.19 ×10−3     8.57 ×10−4     1.24 ×10−3          This shift to a large number of lower scores is shown in the
                                                                  bottom panel of Figure 3. Mbp and Mcosrw are very similar,
Because exercise nodes are connected via word nodes, we in-       but Mbp generally has a wider dynamic range.
terpret the fraction of node visits as a similarity measure be-
tween the source node and any node visited. We performed          4.2   Comparison of the Graph Walks
100,000 random walks from each node. Exercise-to-exercise         Table 1 provides summary statistics for cosine similarity and
similarity can be visualzed as square matrices with source        the two random walks for all pairs of problems (N=3,250,809).
nodes in the rows and target nodes in the columns. To fac-        The cosine matrix is very sparse, as shown by the median
tor out the times a source may have been selected as one of       value of 0. Of the two random walk similarities, rwcos has
the targets, we set the diagonal of the matrix to zero. We        a lower standard deviation around the mean, but otherwise
then normalized across the rows so that we could interpret        the two random walks produce similar distributions.
The similarity values given by cosine and the cosine random       Table 3: Two pairs of problems with high cosine similarity
walk will increasingly differ the more that the start problem     and reverse patterns of graph walk similarity. The first pair,
has relatively higher degree centrality due either to more        from section 5, have lower than average rwcos and higher
words or higher frequency of words in exercises (df). For         than average rwbp due to relatively many words in common
reference, the word that occurs most frequently, number,          (12 and 14). The second pair, from section 6, have higher
has a df of 0.42, and the second most frequent occurs in          than average rwcos and lower than average rwbp due to rel-
only 15% of the exercises. Fifty eight nodes have no edges        atively few words in common.
(0 degree), the most frequent number of edges is 170, and
the maximum is 1,706. Table 2 gives the summary statistics         Prob 1    Prob 2    cosine    rwcos      rwbp   N    max df
for df, number of math words, and degree centrality.                6.5.99    6.5.85   1.0000   0.0032    0.0102   12     0.42
                                                                    6.5.94    6.5.83   0.8819   0.0026    0.0064   14     0.13
Inspection of the data shows that for pairs of problems in         6.6.109   6.6.102   0.8660   0.0068    0.0037    3     0.11
the two chapters for our case study, if the cosine similar-        6.6.104   6.6.102   0.7746   0.0068    0.0029    3     0.11
ity between a pair is high (≥ 0.75), the similarity values
for rwcos tend to go down as the number of shared word
increases from 3 to between 5 and 7. For the rwbp , the           cosine similarity is less than 0.20, the two walks produce
opposite trend occurs, where the similarity goes up as the        very similar results. The average similarity values for the
number of words increases. This difference helps account for      bipartite walk are about 20% higher, and the maximum val-
an observed divergence in the two graph walks for sections        ues are higher, but the two walks produce similar means,
5 and 6 of Chapter 6.                                             independent of the lengths of the common word vectors, or
                                                                  the total number of math words.
Table 3 illustrates two pairs of problems from section 5 that
have high cosine similarities, and relatively higher rwbp sim-    Since we normalized the matrices across rows, which are
ilarities (greater than the rw means of 0.0055) and relatively    the source nodes, differences between the bipartite matrix,
lower rwcos (lower than the rw means). The reverse pattern        Mbp , and the cosine matrices implied that the degree of the
is seen for two pairs of problems from section 6 that have        target node had a greater impact on the variability in the
high cosine similarities. These problems have higher than         bipartite matrix. To measure the impact of the edge degree
average rwcos and lower than average rwbp . What differen-        on the target nodes, we considered the column sum for those
tiates the two pairs of problems is that the section 5 prob-      targets that had 1 edge, those that had 2, etc. up to 20
lems have a relatively large number of words in common:           edges. The results are summarized in Figure 4. As can be
14 for the first pair, 12 for the second pair. In both pairs,     seen, the column sum varies linearly by the number of target
some of the words have relatively high document frequency.        edges in the bipartite matrix, whereas the cosine matrices
As discussed above, these two properties increase the degree      do not. We found the cubed root of the column sum in Mbp
centrality of the start node of a step in the rwcos graph, and    approaches the distribution of column sums of the cosine
thus lower the probability of hitting each of the start node’s    matrices, which is provided in Figure 4.
one-degree neighbors. This effect propagates along the two
steps of the walk. For the rwbp graph, however, as the num-
ber of shared math words for a pair of problems increases,
the number of paths from one to the other also increases,
thus raising the probability of the traversal. This effect also
propagates through a two-step walk. In contrast to the sec-
tion 5 problems, the two section 6 problems have relatively
fewer words in common: 3 for both pairs.

For problem pairs where the cosine similarity is between
0.40 and 0.60, the mean similarity from rwbp is 30% higher
than for rwcos for when the number of math words in com-
mon is 3 (0.0033 vs. 0.0043), 80% higher when the number
of math words in common is 6 (0.0024 versus 0.0045), and          Figure 3: Tail distribution of similarity values in Mcos ,
three as high when the number of math words in common             Mcosrw , and Mbp . Because 62% of the values in Mcos are 0,
is 9 (0.0023 versus 0.0068). For problems pairs where the         the plot shows only non-zero values.

                                                                  5.   CONCLUSION
Table 2: Summary statistics of document frequency (df) of
                                                                  Visualization of the three similarity matrices shows they re-
math words, number of math words in problems, and degree
                                                                  veal the same overall patterns, thus each is confirmed by
centrality of rwcos
                                                                  the others. However, the bipartite walk was the most sen-
                                                                  sitive to word frequency across exercises, and the number
                   df        math words     degree ctr. rwcos
                                                                  of words in problems. With our goal of automatically dis-
 minimum      5.54 ×10−4     1              0                     covering knowledge components and identifying their rela-
 maximum      0.424          24             1,706                 tionships, the random walk that stepped in proportion to
 mean         0.183          8.35           418.4                 its cosine similarity performed best. It was able to discover
 median       6.66 ×10−3     8.00           340                   second-degree relationships that seem reasonable as we ex-
 std. dev.    0.314          3.64           317.1                 plore by eye those matches. Future work will test these re-
                                                                     knowledge-learning-instruction (KLI) framework:
                                                                     Toward bridging the science-practice chasm to
                                                                     enhance robust student learning. Cognitive Science,
                                                                     36(5):757–798, 2012.
                                                                 [9] K. Krippendorff. Content analysis: An introduction to
                                                                     its methodology. Sage Publications, Beverly Hills, CA,
                                                                     1980.
                                                                [10] N. Kushman, Y. Artzi, L. Zettlemoyer, and
                                                                     R. Barzilay. Learning to automatically solve algebra
                                                                     word problems. In Proceedings of the 52nd Annual
                                                                     Meeting of the Association for Computational
                                                                     Linguistics (Volume 1: Long Papers), pages 271–281,
                                                                     Baltimore, Maryland, June 2014. Association for
Figure 4: Distribution of column sums by number of edges
                                                                     Computational Linguistics.
in the target node represented by the column. Error plots
show the mean and standard error for each type. Black line      [11] N. Li, W. W. Cohen, and K. R. Koedinger.
is the cubed root of the mean of the column sums of Mbp .            Discovering student models with a clustering
                                                                     algorithm using problem content. In Proceedings of the
                                                                     6th International Conference on Educational Data
lationships with student performance data. We should find,           Mining, 2013.
for example, that if two exercises are conceptually similar,    [12] E. Minkov, W. W. Cohen, and A. Y. Ng. Contextual
then student outcomes should also be similar and learning            search and name disambiguation in email using
curves should reveal shared knowledge components. In this            graphs. In Proceedings of the 29th Annual
respect, such automatically constructed knowledge graphs             International ACM SIGIR Conference on Research
can create more refined domain models that intelligent tu-           and Development in Information Retrieval, SIGIR ’06,
toring systems and robust assessments can be built upon.             pages 27–34, New York, NY, USA, 2006. ACM.
                                                                [13] I. Niles and A. Pease. Mapping WordNet to the
6.   REFERENCES                                                      SUMO ontology. In Proceedings of the IEEE
 [1] Y. An, J. Janssen, and E. E. Milios. Characterizing             International Knowledge Engineering Conference,
     and mining the citation graph of the computer science           pages 23–26, 2003.
     literature. Knowledge and Information Systems,             [14] J. Otterbacher, G. Erkan, and D. R. Radev. Biased
     6(6):664–678, 2004.                                             lexrank: Passage retrieval using random walks with
 [2] J. Berant, V. Srikumar, P.-C. Chen,                             question-based priors. Information Processing and
     A. Vander Linden, B. Harding, B. Huang, P. Clark,               Management, 45(1):42–54, 2009.
     and C. D. Manning. Modeling biological processes for       [15] L. Page, S. Brin, R. Motwani, and T. Winograd. The
     reading comprehension. In Proceedings of the 2014               pagerank citation ranking: Bringing order to the web.
     Conference on Empirical Methods in Natural Language             Technical Report 1999-66, Stanford InfoLab,
     Processing (EMNLP), pages 1499–1510, Doha, Qatar,               November 1999. Previous number =
     October 2014. Association for Computational                     SIDL-WP-1999-0120.
     Linguistics.                                               [16] T. M. Pilehvar, D. Jurgens, and R. Navigli. Align,
 [3] P. Buitelaar, A. Frank, M. Hartung, and S. Racioppa.            disambiguate and walk: A unified approach for
     Ontology-based information extraction and integration           measuring semantic similarity. In Proceedings of the
     from heterogeneous data sources, 2008.                          51st Annual Meeting of the Association for
 [4] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H.         Computational Linguistics (Volume 1: Long Papers),
     Jr., and T. M. Mitchell. Toward an architecture for             pages 1341–1351. Association for Computational
     never-ending language learning. In Proceedings of the           Linguistics, 2013.
     24th Conference on Artificial Intelligence (AAAI),         [17] J. Scheffczyk, A. Pease, and M. Ellsworth. Linking
     volume 2, pages 1306–1313, 2010.                                FrameNet to the suggested upper merged ontology. In
 [5] G. Erkan and D. R. Radev. Lexrank: Graph-based                  B. Hennett and C. Fellbaum, editors, Formal Ontology
     lexical centrality as salience in text summarization. J.        in Information Systems, pages 289–. IOS Press, 2006.
     Artif. Int. Res., 22(1):457–479, Dec. 2004.                [18] M. J. Seo, H. Hajishirzi, A. Farhadi, and O. Etzioni.
 [6] B. Hixon, P. Clark, and H. Hajishirzi. Learning                 Diagram understanding in geometry questions. In
     knowledge graphs for question answering through                 Proceedings of the 28th AAAI Conference on Artificial
     conversational dialog. In Proceedings of the 2013               Intelligence, 2014.
     Conference of the North American Chapter of the            [19] M. Šimko and M. Bieliková. Automatic concept
     Association for Computational Linguistics: Human                relationships discovery for an adaptive e-course. In
     Language Technologies, Denver, CO, May-June 2015.               Proceedings of the Second International Conference on
 [7] R. J. L. John, R. J. Passonneau, and T. S. McTavish.            Educational Data Mining (EDM), pages 171–179,
     Semantic similarity graphs of mathematics word                  2009.
     problems: Can terminology detection help? In
     Proceedings of the Eighth International Conference on
     Educational Data Mining, 2015.
 [8] K. R. Koedinger, A. T. Corbett, and C. Perfetti. The