Graph-based Educational Data Mining (G-EDM 2015) Collin F. Lynch Dr. Tiffany Barnes Department of Computer Department of Computer Science Science North Carolina State North Carolina State University University Raleigh, North Carolina Raleigh, North Carolina cflynch@ncsu.edu tmbarnes@ncsu.edu Dr. Jennifer Albert Michael Eagle Department of Computer Science Department of Computer North Carolina State University Science Raleigh, North Carolina North Carolina State jlsharp@ncsu.edu University Raleigh, North Carolina mjeagle@ncsu.edu 1. INTRODUCTION Thus, graphs are simple in concept, general in structure, and Fundamentally, a graph is a simple concept. At a basic level a have wide applications for Educational Data Mining (EDM). graph is a set of relationships {e(n0 ,n2 ),e(n0 ,nj ),...,e(nj−1 ,nj )} Despite the importance of graphs to data mining and data anal- between elements. This simple concept, however, has afforded the ysis there exists no strong community of researchers focused on development of a complex theory of graphs [1] and rich algorithms Graph-Based Educational Data Mining. Such a community is for combinatorics [7] and clustering [4]. This has, in turn, made important to foster useful interactions, share tools and techniques, graphs a fundamental part of educational data mining. and to explore common problems. Many types of data can be naturally represented as graphs such 2. GEDM 2014 as social network data, user-system interaction logs, argument This is the second workshop on Graph-Based Educational Data diagrams, logical proofs, and forum discussions. Such data has Mining. The first was held in conjunction with EDM 2014 in grown exponentially in volume as courses have moved online and London [17]. The focus of that workshop was on seeding an initial educational technology has been incorporated into the traditional community of researchers, and on identifying shared problems, and classroom. Analyzing it can help to answer a range of important avenues for research. The papers presented covered a range of top- questions such as: ics including unique visualizations [13], social capital in educational networks [8], graph mining [19, 11], and tutor construction [9]. • What path(s) do high-performing students take through online educational materials? The group discussion sections at that workshop focused on the • What social networks can foster or inhibit learning? distinct uses of graph data. Some of the work presented focused • Do users of online learning tools behave as the system designers on student-produced graphs as solution representations (e.g. [14, expect? 3]) while others focused more on the use of graphs for large-scale • What diagnostic substructures are commonly found in student- analysis to support instructors or administrators (e.g. [18, 13]). produced diagrams? These differing uses motivate different analytical techniques and, • Can we use prior student data to identify students’ solution as participants noted, change our underlying assumptions about plan, if any? the graph structures in important ways. • Can we use prior student data to provide meaningful hints in complex domains? • Can we identify students who are particularly helpful based 3. GEDM 2015 upon their social interactions? Our goal in this second workshop was to build upon this nascent community structure and to explore the following questions: 1. What common goals exist for graph analysis in EDM? 2. What shared resources such as tools and repositories are re- quired to support the community? 3. How do the structures of the graphs and the analytical methods change with the applications? The papers that we include here fall into four broad categories: interaction, induction, assessment, and MOOCs. Work by Poulovassilis et al. [15] and Lynch et al. [12] focuses Data Mining 2014, co-located with 7th International on analyzing user-system interactions in state based learning Conference on Educational Data Mining (EDM environments. Poulovassilis et al. focuses on the analyses of 2014), London, United Kingdom, July 4-7, 2014., volume individual users’ solution paths and presents a novel mechanism 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. to query solution paths and identify general solution strategies. [4] M. Girvan and M. E. J. Newman. Community Lynch et al. by contrast, examined user-system interactions from structure in social and biological networks. Proc. of the existing model-based tutors to examine the impact of specific National Academy of Sciences, 99(12):7821–7826, June 2002. design decisions on student performance. [5] J. Guerra. Graph analysis of student model networks. In C. F. Lynch, T. Barnes, Price & Barnes [16] and Hicks et al. [6] focus on applying these J. Albert, and M. Eagle, editors, Proceedings of the Second same analyses in the open-ended domain of programming. Unlike International Workshop on Graph-Based Educational Data more discrete tutoring domains where users enter single equations Mining (GEDM 2015). CEUR-WS, June 2015. (in press). or select actions, programming tutors allow users to make drastic [6] A. Hicks, V. Catete, R. Zhi, changes to their code on each step. This can pose challenges for Y. Dong, and T. Barnes. Bots: Selecting next-steps from data-driven methods as the student states are frequently unique player traces in a puzzle game. In C. F. Lynch, T. Barnes, and admit no easy single-step advice. Price and Barnes present a J. Albert, and M. Eagle, editors, Proceedings of the Second novel method for addressing the data sparsity problem by focusing International Workshop on Graph-Based Educational Data on minimal-distance changes between users [16] while in related Mining (GEDM 2015). CEUR-WS, June 2015. (in press). work Hicks et al. focuses on the use of path weighting to select [7] D. E. Knuth. The actionable advice in a complex state space [6]. Art of Computer Programming: Combinatorial Algorithms, Part 1, volume 4A. Addison-Wesley, 1st edition, 2011. The goal in much of this work is to identify rules that can [8] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. be used to characterize good and poor interactions or good and What is the source of social capital? the association poor graphs. Xue at al. sought address this challenge in part via between social network position and social presence the automatic induction of graph rules for student-produced dia- in communities of inquiry. In S. G. Santos and O. C. Santos, grams [22]. In their ongoing work they are applying evolutionary editors, Proceedings of the Workshops held at Educational computation to the induction of Augmented Graph Grammars, Data Mining 2014, co-located with 7th International a graph-based formalism for rules about graphs. Conference on Educational Data Mining (EDM 2014), London, United Kingdom, July 4-7, 2014., volume The work described by Leo-John et al. [10], Guerra [5] and 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. Weber & Vas [21], takes a different tack and focuses not on graphs [9] R. Kumar. Cross-domain performance of automatic tutor representing solutions or interactions but on relationships. Leo- modeling algorithms. In S. G. Santos and O. C. Santos, John et al. present a novel approach for identifying closely-related editors, Proceedings of the Workshops held at Educational word problems via semantic networks. This work is designed to Data Mining 2014, co-located with 7th International support content developers and educators in examining a set of Conference on Educational Data Mining (EDM questions and in giving appropriate assignments. Guerra takes 2014), London, United Kingdom, July 4-7, 2014., volume a similar approach to the assessment of users’ conceptual changes 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. when learning programming. He argues that the conceptual [10] R.-J. Leo-John, T. McTavish, and R. Passonneau. relationship graph affords a better mechanism for automatic as- Semantic graphs for mathematics word problems based sessment than individual component models. This approach is on mathematics terminology. In C. F. Lynch, T. Barnes, also taken up by Weber and Vas who present a toolkit for graph- J. Albert, and M. Eagle, editors, Proceedings of the Second based self-assessment that is designed to bring these conceptual International Workshop on Graph-Based Educational Data structures under students’ direct control. Mining (GEDM 2015). CEUR-WS, June 2015. (in press). And finally, Vigentini & Clayphan [20], and Brown et al. [2] [11] C. F. Lynch. AGG: augmented graph grammars for complex focus on the unique problems posed by MOOCs. Vigentini and heterogeneous data. In S. G. Santos and O. C. Santos, Clayphan present work on the use of graph-based metrics to editors, Proceedings of the Workshops held at Educational assess students’ on-line behaviors. Brown et al., by contrast, focus Data Mining 2014, co-located with 7th International not on local behaviors but on social networks with the goal of Conference on Educational Data Mining (EDM identifying stable sub-communities of users and of assessing the 2014), London, United Kingdom, July 4-7, 2014., volume impact of social relationships on users’ class performance. 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [12] C. F. Lynch, T. W. Price, M. Chi, and T. Barnes. Using the hint factory to analyze 4. REFERENCES model-based tutoring systems. In C. F. Lynch, T. Barnes, [1] B. Bollobás. J. Albert, and M. Eagle, editors, Proceedings of the Second Modern Graph Theory. Springer Science+Business International Workshop on Graph-Based Educational Data Media Inc. New York, New York, U.S.A., 1998. Mining (GEDM 2015). CEUR-WS, June 2015. (in press). [2] R. Brown, C. F. Lynch, Y. Wang, [13] T. McTavish. Facilitating graph interpretation via interactive M. Eagle, J. Albert, T. Barnes, R. Baker, Y. Bergner, hierarchical edges. In S. G. Santos and O. C. Santos, and D. McNamara. Communities of performance editors, Proceedings of the Workshops held at Educational & communities of preference. In C. F. Lynch, T. Barnes, Data Mining 2014, co-located with 7th International J. Albert, and M. Eagle, editors, Proceedings of the Second Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014., volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [3] R. Dekel and K. Gal. On-line plan recognition in exploratory [14] B. Mostafavi and T. Barnes. Evaluation of logic proof problem learning environments. In S. G. Santos and O. C. Santos, difficulty through student performance data. In S. G. Santos editors, Proceedings of the Workshops held at Educational and O. C. Santos, editors, Proceedings of the Workshops Data Mining 2014, co-located with 7th International held at Educational Data Mining 2014, co-located with 7th Conference on Educational Data Mining (EDM International Conference on Educational Data Mining (EDM 2014), London, United Kingdom, July 4-7, 2014., volume 2014), London, United Kingdom, July 4-7, 2014., volume 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [19] K. Vaculı́k, L. Nezvalová, and L. Popelı́nsky. Graph mining [15] A. Poulovassilis, S. G. Santos, and M. Mavrikis. Graph-based and outlier detection meet logic proof tutoring. In S. G. Santos modelling of students’ interaction data from exploratory and O. C. Santos, editors, Proceedings of the Workshops learning environments. In C. F. Lynch, T. Barnes, held at Educational Data Mining 2014, co-located with 7th J. Albert, and M. Eagle, editors, Proceedings of the Second International Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014., volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [16] T. Price and T. Barnes. An [20] L. Vigentini and A. Clayphan. Exploring the function exploration of data-driven hint generation in an open-ended of discussion forums in moocs: comparing data mining programming problem. In C. F. Lynch, T. Barnes, and graph-based approaches. In C. F. Lynch, T. Barnes, J. Albert, and M. Eagle, editors, Proceedings of the Second J. Albert, and M. Eagle, editors, Proceedings of the Second International Workshop on Graph-Based Educational Data International Workshop on Graph-Based Educational Data Mining (GEDM 2015). CEUR-WS, June 2015. (in press). Mining (GEDM 2015). CEUR-WS, June 2015. (in press). [17] S. G. Santos and O. C. Santos, [21] C. Weber and R. Vas. Studio: Ontology-based editors. Proceedings of the Workshops held at Educational educational self-assessment. In C. F. Lynch, T. Barnes, Data Mining 2014, co-located with 7th International J. Albert, and M. Eagle, editors, Proceedings of the Second Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014, volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [22] L. Xue, C. F. Lynch, and M. Chi. Graph grammar induction [18] V. Sheshadri, C. Lynch, and T. Barnes. by genetic programming. In C. F. Lynch, T. Barnes, Invis: An EDM tool for graphical rendering and analysis of J. Albert, and M. Eagle, editors, Proceedings of the Second student interaction data. In S. G. Santos and O. C. Santos, International Workshop on Graph-Based Educational Data editors, Proceedings of the Workshops held at Educational Mining (GEDM 2015). CEUR-WS, June 2015. (in press). Graph Grammar Induction via Evolutionary Computation Linting Xue Collin F.Lynch Min Chi Electrical Engineering Computer Science Computer Science Department Department Department North Carolina State North Carolina State North Carolina State University University University Raleigh, North Carolina, Raleigh, North Carolina, Raleigh, North Carolina, U.S.A. U.S.A. U.S.A. lxue3@ncsu.edu cflynch@ncsu.edu mchi@ncsu.edu ABSTRACT In this paper we describe our ongoing work on the automatic Augmented Graph Grammars provide a robust formalism induction of Augmented Graph Grammars via Evolutionary for representing and evaluating graph structures. With the Computation (EC). Our long-term goal in this work is to advent of robust graph libraries such as AGG, it has be- develop automated techniques that can extract empirically- come possible to use graph grammars to analyze realistic valid graph rules which can, in turn, be used to classify data. Prior studies have shown that graph rules can be used student-produced argument diagrams and to provide the ba- to evaluate student work and to identify empirically-valid sis for automated student guidance and evaluation. This will substructures using hand-authored rules. In this paper we build upon our prior on the evaluation of a-priori rules for describe proposed work on the automatic induction of graph student arguments. We will begin with background material grammars for student data using evolutionary computation on Augmented Graph Grammars and discuss prior work on via the pyEC system. grammar induction. We will then present an overview of our planned work. Keywords Augmented Graph Grammars, Graph Data, Evolutionary 2. AUGMENTED GRAPH GRAMMARS Computation & ARGUMENT DIAGRAMS Classical graph grammars are designed to deal with fixed 1. INTRODUCTION graphs that are composed from a finite set of static node and Graph Grammars are logical rule representations for graph arc types. Augmented graph grammars are an extension of structures. They can be designed to encode classes of suit- simple graph grammars that allow for complex node and arc able graphs to recognize complex sub-features. They were types, optional substructures, and complex rule expressions introduced by Rosenfeld and Pfaltz in 1969 as “Context-free [12]. Rather than using a fixed alphabet of graph compo- web grammars” [1]. Since then Graph grammars have been nents they are defined by a complex ontology that allows for applied to a wide range of areas, including pattern recogni- subsidiary types such as textual fields, access functions, and tion [2, 3, 4]; visual programming languages [5]; biological directional information. They can also be used to evaluate development [6]; classification of chemical compounds [7, 8]; negated elements as well as quantified expressions. As such and social network analysis [9, 10, 11]. Simple graph gram- they are better suited to rich graph data such as user-system mars are, like string grammars, composed of a set of pro- interaction logs and student-produced argument diagrams. duction rules that map from one structure to another. In this case the rules map from a simple subgraph, typically a Augmented Graph Grammars have previously been used for single node or arc, to a more complex structure. As with the detection of empirically-valid substructures in student- string grammars the node and arc types are drawn from produced argument diagrams [13, 14]. In that work a-priori finite alphabets. Despite their utility, however, graph gram- rules were used to represent key discussion features and ar- mars are very difficult to construct. The data structures gumentative flaws. Argument diagrams are graphical ar- required are complex [5]. Moreover, development of suitable gument representations that reify key features of arguments graph grammars generally requires considerable domain ex- such as hypothesis statements, claims, and citations as nodes pertise. Most existing uses of graph grammars have relied and the supporting, opposing, and informational relation- on hand-authored rules. ships as arcs between them. A sample diagram collected in that work is shown in Figure 1 The diagram includes a central research claim node, which has a single text field indicating the content of the research claim. A set of citation nodes are connected to the claim node via a set of supporting, opposing and undefined arcs colored with green, red and blue respectively. Each citation node contains two fields: one for the citation information, and the other for a summary of the cited work; each arc has a single text field explaining why the relationship holds. At Figure 1: A student-produced LASAD argument diagram representing an introductory argument. the bottom of the diagram, there is a single isolated hypoth- t esis node that contains two text fields, one for a conditional or IF field, and the other for conditional or THEN field. We expect the induced graph grammars from a set of argument (P aredW comp) O S diagrams can be used to evaluate the student thesis work. a ¬c b Figure 2 shows an a-priori rule that was defined as part of that work. This rule is designed to identify a subgraph  t.T pye = “claim00 or“hypothesis00   where a single target node t is connected to two separate  a.T ype = “Citation00   citation nodes a and b such that: a is connected to t via an   00 opposing path; b is connected via a supporting path; and  b.T ype = “Citation  c.T ype = “Comparison00   there exists no comparison arc between a and b. The rules   in that study were implemented using AGG an augmented graph grammar library built in Python [12]. AGG matches Figure 2: A simple augmented graph grammar rule the graphs using recursive stack-based algorithm. The code that detects compared counterarguments. The rule first matches the ground nodes at the top-level of the class shows a two citation nodes (a, & b) that have oppos- (t, a, & b). It then tests for the recursive productions O, ing relationships with a shared claim node (t) and do and S, before finally testing for the negated comparison arc not have a comparison arc (c) drawn between them. c. This rule does not make use of the full range of potential The arcs S and O represent recursive supporting capacity for Augmented Graph Grammars. However it is and opposing path. illustrative of the type of rules we plan to induce here, rules that generalize beyond basic types and draw on existing pro- duction classes but not, at least in the immediate term, use complex textual elements or functional features. algorithm [19]. They are based upon controlled graph walks coupled with indexing. While these algorithms are effective, 3. GRAMMAR INDUCTION particularly on smaller graphs, with low vertex degree they Graph and relational data has grown increasingly prevalent can also overfit simpler graph structures and they do not and graph analysis algorithms have been applied in a wide scale well to larger, denser graph data [20]. range of domains from social network analysis [15] to bioin- formatics [16]. Most of this work falls into one of two cate- The SUBDUE system takes a greedy-compression approach gories of algorithms: frequent subgraph matching, and graph to graph mining. SUBDUE searches for candidate sub- compression. graphs that can best compress the input graphs by replacing a candidate subgraph with a single vertex. Then nodes and A number of algorithms have been developed to discover fre- arcs are added to the vertices to form new candidate sub- quent subgraphs. These include the gSpan algorithm devel- graphs. The process is recursive and relies on the Minimum- oped by Yan and Han [17]; the AGM algorithm developed by Description-Length (MDL) principle to evaluate the candi- Inokuchi et al [18]; and the FSG algorithm developed by Ku- dates. SUBDUE has been applied successfully to extract ramochi and Karypis which is based on the previous Apriori structure from visual programming [5], web search [21], and analyzing user behaviors in games [22]. lution space. Therefore it is well suited to our present needs. This flexibility is costly, however, as EC is far more compu- While these methods are successful they have practical and tationally expensive than more specialized algorithms, and theoretical limitations. Both classes of approaches are lim- applications of EC can require a great deal of tuning for ited to static graphs composed from a finite alphabet of node each use. In the subsections below we will describe the fit- and arc types. The frequent subgraph approaches are based ness function and the operators that we will use in this work. upon iterative graph walks and can be computationally ex- For this work we will rely on pyEC a general purpose evo- pensive and are limited to finding exact matches. They do lutionary computation engine that we have developed [26]. not generalize beyond the exact graphs matched nor do they allow for recursive typing. SUBDUE, by contrast is a greedy 4.2 Dataset algorithm that finds the single most descriptive grammar Our initial analysis will be based upon a corpus of expert and does not allow for weighted matches. graded student produced argument diagrams and essays pre- viously described in [13, 14]. That dataset was collected as For our present purposes, however, our goal is to identify part of a study on students’ use of argument diagrams for multiple heirarchical classes of the type shown in Figure 2 writing that was conducted at the University of Pittsburgh that can: generalize beyond exact node and arc types; can in 2011. For that study we selected a set of students in an draw on recursive rule productions; and can be weighted undergraduate-level course on Psychological Research Meth- based upon the graph quality. Moreover our long-term goal ods. As part of the course the students were tasked with with this work is to explore graph rule induction mechanisms planning and executing an empirical research study and then that can be expanded to include textual rules and complex drafting a written report. The students were permitted to constraints. For that reason we have elected to apply evo- work individually or in teams. This report was structured as lutionary computation. This is a general-purpose machine a standard empirical workshop paper. Prior to drafting the learning mechanism that can be tunes to explore a range of report the students were tasked with diagramming the ar- possible induction mechanisms. gument that they planned to make using LASAD an online tool for argument diagramming and annotation. 4. METHODS Subsequent to this data collection process the diagrams and 4.1 Evolutionary Computation essays were graded by an experienced TA using a set of par- Evolutionary Computation (EC) is a general class of ma- allel grading rubrics. These rubrics focused on the quality of chine learning and optimization methods that are inspired the arguments in the diagrams and essays and were used to by the process of Darwinian evolution through natural se- demonstrate that the structure and quality of the diagrams lection [23] such as Genetic Algorithms [24] or Genetic Pro- can be used to predict the students’ subsequent essay per- gramming [25]. EC algorithms begin with a population of formance. These grades will be used as the weighting metric randomly generated candidate solutions such as snippets of for the diagrams and will be correlated with performance as random code, strings representing a target function, or for- part of the fitness function we describe below. After comple- mal rules. Each of these solutions is ranked by a fitness tion of the data collection, grading, and testing phases and function that is used to evaluate the quality of the individu- accounting for student dropout and incomplete assignments als. These functions can be defined by absolute measures of we collected 105 graded diagram-essay pairs 74 of which were success such as a suite of test cases, or by relative measures authored by teams. such as a competition between chess-playing systems. 4.3 Solution Representation Once the individuals have been ranked a new generation of For the purposes of our present experiments we will use a individuals is produced through a combination of crossover restricted solution representation that relies on a subset of and mutation operations. Crossover operations combine two the augmented graph grammar formalism exemplified by the or more parents to produce one or more candidate children. rule shown in Figure 2. This will include only element types In Genetic Algorithms where the candidate solutions are rep- and recursive productions. In future work we plan to sup- resented as strings this can be accomplished by splitting two port the induction of more complex rules defined by multiple parents at a given index and then exchanging the substrings graph classes, novel productions, and expressions. However to produce novel children. In Genetic Programming the par- for the present study we will focus on the simple case of ents exchange blocks of code, functions, or subtrees. Muta- individual classes coupled with predefined productions. tion operations alter randomly-selected parts of a candidate solution by swapping out one symbol or instruction for an- 4.4 Fitness Function other, adding new sub-solutions, or deleting components. We plan to use the frequency correlation metric previously This process of ranking and regeneration will iterate until employed in [13, 14]. In that study the authors assessed the a target performance threshold is reached or a maximum empirical validity of a set of a-priori diagram rules. The number of generations has passed. validity of each individual rule was assessed by testing the correlation between the frequency of the class in the existing EC methods are highly general algorithms that can be read- graph and the graph grade. The strength of that correlation ily adapted to novel domains by selecting an appropriate was estimated using Spearman’s ρ a non-parametic measure solution representation and modification operations. Thus, of correlation [27]. In that work the authors demonstrated in contrast to more specific methods such as SUBDUE, the that the a-priori rule frequency was correlated with stu- EC algorithm allows us to tune the inductive bias of our dents’ subsequent essay grades and showed that the frequen- search and to explore alternative ways of traversing the so- cies could be used to predict students’ future performance. 4.5 Mutation Our mutation operator will draw on the predefined graph ontology to make atomic changes to an existing graph class. B C D F G The change will be one of the following operations: 1 ∅ 3 A 7 ∅ E Change Node change an existing node’s type. Change Arc Change an existing arc’s type or orientation. 4 5 B ∅ F Delete Node Delete a node and its associated arcs. ∅ C Delete Arc Delete an existing arc. Add Node Add a novel node with a specified type. Figure 3: Canonical matricies for crossover parents. Add Arc Add an arc between existing nodes or add with new nodes. 4.6 Crossover B G D F C By design the crossover operation should, like genetic crossover, be conservative. Two very similar parents should produce 1 ∅ 3 E 7 ∅ A similar offspring. Crossover operations should therefore pre- serve good building blocks and sub-solutions or introns through random behavior [25]. Arbitrary graph alignment and crossover ∅ 5 B 4 F is a challenging problem that risks causing unsustainable changes on each iteration. We therefore treat graph crossover as a matrix problem. ∅ G For each pair of parent classes we will define a pair of di- agonal matricies of the type illustrated in Figure 3. The Figure 4: Canonical matricies for crossover children. letter indicies on the top and right indicate nodes while the numerical indicies internally indicate arcs, and the ∅ symbol indicates that no arc is present. The matricies are gener- grading and feedback. This work represents an improve- ated in a canonical order based upon the order in which the ment over prior graph grammar induction algorithms which nodes were added to the class. Thus on each iteration of the are limited to classical graph grammars and greedy extrac- crossover process the corresponding elements will obtain the tion. This work also represents an extension for evolution- same index. As a consequence good subsolutions will obtain ary computation by shifting it into a new domain. As part the same location and will tend to be preserved over time. of this work we also plan to explore additional extensions to the standard evolutionary computation algorithm to address Once a set of parent matricies has been generated we then problems of over-fitting such as χ2 reduction. generate two child matricies of the same size as the parents and then randomly select the node and arc members. In the example shown in figures 3 and 4 the parents have nodes 6. REFERENCES {A,B,C,D} and {E,F,G} while the children have {E,B,G,D} [1] John L Pfaltz and Azriel Rosenfeld. Web grammars. and {A,F,C}. Thus we align the nodes in canonical order In Proceedings of the 1st international joint conference and, for each node pair, we flip a coin to decide where they on Artificial intelligence, pages 609–619. Morgan are copied. If one parent is larger than the other than any Kaufmann Publishers Inc., 1969. additional nodes, in this case D, will be copied to the larger [2] John L Pfaltz. Web grammars and picture description. child. We then perform a comparable exchange process for Computer Graphics and Image Processing, the arcs. Each arc or potential arc is defined uniquely in the 1(2):193–220, 1972. matricies by its endpoints. We thus align the lists of arcs [3] Horst Bunke. Attributed programmed graph in a comparable manner and then decide randomly which grammars and their application to schematic diagram arc, or empty arc, to copy. As with the nodes, extra arcs interpretation. IEEE Transactions on Pattern from the larger parent, in this case 3,5, and one ∅ are copied Analysis and Machine Intelligence, 4(6):574–582, 1982. directly into the larger of the two children. [4] Michihiro Kuramochi and George Karypis. Finding frequent patterns in a large sparse graph*. Data 5. FUTURE WORK mining and knowledge discovery, 11(3):243–271, 2005. In this paper we presented a method for the induction of [5] Keven Ates, Jacek Kukluk, Lawrence Holder, Diane augmented graph grammars through evolutionary computa- Cook, and Kang Zhang. Graph grammar induction on tion. We are presently applying this work to the automatic structural data for visual programming. In Tools with induction of empirically-valid rules for student-produced ar- Artificial Intelligence, 2006. ICTAI’06. 18th IEEE gument diagrams. This work will serve to extend our prior International Conference on, pages 232–242. IEEE, efforts on the use of augmented graph grammars for student 2006. [6] Francesc Rosselló and Gabriel Valiente. Graph CSREA Press, 2008. transformation in molecular biology. In Formal [17] Xifeng Yan and Jiawei Han. gspan: Graph-based Methods in Software and Systems Modeling, pages substructure pattern mining. In Proceedings of the 116–133. Springer, 2005. IEEE International Conference on Data Mining [7] Luc Dehaspe, Hannu Toivonen, and Ross D King. (ICDM 2002), pages 721–724. IEEE, 2002. Finding frequent substructures in chemical [18] Akihiro Inokuchi, Takashi Washio, and Hiroshi compounds. In KDD, volume 98, page 1998, 1998. Motoda. An apriori-based algorithm for mining [8] Stefan Kramer, Luc De Raedt, and Christoph Helma. frequent substructures from graph data. In Principles Molecular feature mining in hiv data. In Proceedings of Data Mining and Knowledge Discovery, pages of the seventh ACM SIGKDD international conference 13–23. Springer, 2000. on Knowledge discovery and data mining, pages [19] Michihiro Kuramochi and George Karypis. Frequent 136–143. ACM, 2001. subgraph discovery. In Proceedings IEEE International [9] Wenke Lee and Salvatore J Stolfo. A framework for Conference on Data Mining. (ICDM 2001), pages constructing features and models for intrusion 313–320. IEEE, 2001. detection systems. ACM transactions on Information [20] MICHIHIRO Kuramochi and George Karypis. Finding and system security (TiSSEC), 3(4):227–261, 2000. topological frequent patterns from graph datasets. [10] Calvin Ko. Logic induction of valid behavior Mining Graph Data, pages 117–158, 2006. specifications for intrusion detection. In Proceedings of [21] Nitish Manocha, Diane J Cook, and Lawrence B the IEEE Symposium on Security and Privacy. (S&P Holder. Cover story: structural web search using a 2000), pages 142–153. IEEE, 2000. graph-based discovery system. Intelligence, [11] Rakesh Agrawal, Ramakrishnan Srikant, et al. Fast 12(1):20–29, 2001. algorithms for mining association rules. In Proceedings [22] Diane J. Cook, Lawrence B. Holder, and G. Michael of the 20th International Conference on very large Youngblood. Graph-based analysis of human transfer data bases, VLDB, volume 1215, pages 487–499, 1994. learning using a game testbed. IEEE Trans. on [12] Collin F Lynch. Agg: Augmented graph grammars for Knowl. and Data Eng., 19:1465–1478, November 2007. complex heterogeneous data. In Proceedings of the [23] Charles Darwin. On the Origin of Species by Means of first international workshop on Graph-Based Natural Selection, or the Preservation of Favoured Educational Data Mining (GEDM 2014). Races in the Struggle for Life. John Murray: [13] Collin F. Lynch and Kevin D. Ashley. Empirically Albermarle Street: London, United Kingdom, 6 valid rules for ill-defined domains. In John Stamper edition, 1872. and Zachary Pardos, editors, Proceedings of The 7th [24] Melanie Mitchell. An Introduction to Genetic International Conference on Educational Data Mining Algorithms. MIT Press: Cambridge, Massachusetts, (EDM 2014). International Educational Datamining 1999. Society IEDMS, 2014. [25] Wolfgang Banzhaf. Genetic programming: an [14] Collin F. Lynch, Kevin D. Ashley, and Min Chi. Can introduction on the automatic evolution of computer diagrams predict essay grades? In Stefan programs and its applications. Morgan Kaufmann Trausan-Matu, Kristy Elizabeth Boyer, Martha E. Publishers ; Heidelburg : Dpunkt-verlag; San Crosby, and Kitty Panourgia, editors, Intelligent Francisco, California, 1998. Tutoring Systems, Lecture Notes in Computer Science, [26] Collin F. Lynch, Kevin D. Ashley, Niels Pinkwart, and pages 260–265. Springer, 2014. Vincent Aleven. Argument graph classification with [15] Sherry E. Marcus, Melanie Moy, and Thayne Coffman. genetic programming and c4.5. In Ryan Social network analysis. In Diane J. Cook and Shaun Joazeiro de Baker, Tiffany Barnes, and Lawrence B. Holder, editors, Mining Graph Data, Joseph E. Beck, editors, The 1st International chapter 17, pages 443–468. John Wiley & Sons, 2006. Conference on Educational Data Mining, Montreal, [16] Chang Hun You, Lawrence B. Holder, and Diane J. Québec, Canada, June 20-21, 2008. Proceedings, pages Cook. Dynamic graph-based relational learning of 137–146, 2008. temporal patterns in biological networks changing over [27] Wikipedia. Spearman’s rank correlation coefficient — time. In Hamid R. Arabnia, Mary Qu Yang, and wikipedia, the free encyclopedia, 2013. [Online; Jack Y. Yang, editors, BIOCOMP, pages 984–990. accessed 27-February-2013]. Communities of Performance & Communities of Preference Rebecca Brown Collin Lynch Yuan Wang North Carolina State North Carolina State Teachers College, Columbia University University University Raleigh, NC Raleigh, NC New York, NY rabrown7@ncsu.edu cflynch@ncsu.edu elle.wang@columbia.edu Michael Eagle Jennifer Albert Tiffany Barnes North Carolina State University North Carolina State North Carolina State Raleigh, NC University University mjeagle@ncsu.edu Raleigh, NC Raleigh, NC jennifer_albert@ncsu.edu tmbarnes@ncsu.edu Ryan Baker Yoav Bergner Danielle McNamara Teachers College, Columbia Educational Testing Service Arizona State University University Princeton, NJ Phoenix, AZ New York, NY ybergner@gmail.com dsmcnamara1@gmail.com ryanshaunbaker@gmail.com ABSTRACT the performance of weaker ones. It has not yet been shown, The current generation of Massive Open Online Courses (MOOCs) however, that this type of support occurs in practice. operate under the assumption that good students will help poor students, thus alleviating the burden on instructors and Teaching Prior research on social networks has shown that social groups, Assistants (TAs) of having thousands of students to teach. In even those that gather face-to-face, can fragment into disjoint practice, this may not be the case. In this paper, we examine so- sub-communities [37]. This small-group separation, if it takes cial network graphs drawn from forum interactions in a MOOC place in an online course, can be considered negative or positive, to identify natural student communities and characterize them depending on one’s perspective. If poor students communi- based on student performance and stated preferences. We exam- cate only with similarly-floundering peers, then they run the ine the community structure of the entire course, students only, risk of perpetuating misunderstandings and of missing insights and students minus low performers and hubs. The presence of discussed by better-performing peers and teaching staff. An these communities and the fact that they are homogeneous with instructor may wish to avoid this fragmentation to encourage respect to grade but not motivations has important implications poor students to connect with better ones. for planning in MOOCs. These enduring subgroups may be beneficial, however, by help- Keywords ing students to form enduring supportive relationships. Research MOOC, social network, online forum, community detection by Li et al. has shown that such enduring relationships can enhance students’ social commitment to a course [18]. We be- lieve that this social commitment will in turn help to reduce 1. INTRODUCTION feelings of isolation and alienation among students in a course. The current generation of Massive Open Online Courses (MOOCs) Eckles and Stradley [9] have shown that such isolation is a key is designed to leverage student interactions to augment instruc- predictor of student dropout. tor guidance. The activity in courses on sites such as Coursera and edX is centered around user forums that, while curated We have previously shown that students can form stable com- and updated by instructors and TAs, are primarily constructed munities and that those communities are homogeneous with by students. When planning and building these courses, it is respect to performance [3]. However that work did not: show hoped that students will help one another through the course whether these results are consistent with prior work on imme- and that interacting with stronger students will help to improve diate peer relationships; address the impact of hub students on these results; or discuss whether students’ varying goals and preferences motivate the community structure. Our goal in this paper is to build upon our prior work by addressing these issues. In the remainder of this paper we will survey prior educational literature on community formation in traditional and online classrooms. We will then build upon our prior work by exam- ining the impact of hub users. And we will look at the impact of user motivations on community formation. 2. RELATED WORK 2.2 Communities, Hubs, & Peers Kovanovic et al. [15] examined the relationship between social 2.1 MOOCs, Forums, & Student Performance network position or centrality, and social capital formation in A survey of the literature on MOOCs shows the beginnings of a courses. Their work is specifically informed by the Community research base generating an abundance of data that has not yet of Inquiry (COI) framework. the COI framework is focused on been completely analyzed [19]. According to Seaton et al. [29], distance education and is particularly suited to online courses of most of the time students spend on a MOOC is spent in dis- the type that we study here. The model views course behavior cussion forums, making them a rich and important data source. through three presences which mediate performance: cognitive, Stahl et al. [30] illustrates how through this online interaction teaching, and social. students collaborate to create knowledge. Thus students’ forum activity is good not only for the individual student posting con- This social presence considers the nature and persistence of tent or receiving answers, but for the class as a whole. Huang et student interactions and the extent to which they reinforce stu- al. [14] investigated the behavior of the highest-volume posters dents’ behaviors. In their analysis, the authors sought to test in 44 MOOC-related forums. These “superposters” tended to whether network relationships, specifically students’ centrality enroll in more courses and do better in those courses than the in their social graph, is related to their social performance as average. Their activity also added to the overall volume of forum measured by the nature and type of their interactions. To that content and they left fewer questions unanswered in the forums. end, they examined a set of course logs taken from a series of Huang et al. also found that these superposters did not suppress online courses offered within a public university. They found the activity of less-active users. Rienties et al. [25] examined the that students’ position within their social graph was positively way in which user interaction in MOOCs is structured. They correlated with the nature and type of their interactions, thus found that allowing students to self-select collaborators is more indicating that central players also engaged in more useful social conducive to learning than randomly assigning partners. Further, interactions. They did not extend this work to groups, however, Van Dijk et al. [31] found that simple peer instruction is signif- focusing solely on individual hub students. icantly less effective in the absence of a group discussion step, pointing again to the importance of a class discussion forum. Other authors have also examined the relationship between network centrality, neighbor relationships, network density, and More recently Rosé et al. [27] examined students’ evolving inter- student performance factors. Eckles and Stradley [9] applied actions in MOOCs using a Mixed-Membership Stochastic Block network analysis to student attrition, finding that students with model which seeks to detect partially overlapping communities. strong social relationships with other students who drop out They found that the likelihood that students would drop out are significantly more likely to drop out themselves. Rizzuto of the course is strongly correlated with their community mem- et al. [26] studied the impact of social network density on stu- bership. Students who actively participated in forums early in dent performance. Network density is defined as the fraction the course were less likely to drop out later. Furthermore, they of possible edges that are present in a given graph. Thus it found one forum sub-community that was much more prone is a measure of how “clique-like” the graph is. The authors to dropout than the rest of the class, suggesting that MOOC examined self-reported social networks for students in a large communities are made up of students who behave in similar traditional undergraduate psychology course. They found that ways. This community can in turn reflect or impact a student’s denser social networks were significantly correlated with per- level of motivation and their overall experience in a course much formance. However, a dominance analysis [1] showed that this like the “emotional contagion” model used in the Facebook mood factor was less predictive than pure academic ability. These re- manipulation study by Kramer, Guillroy, and Hancock [16]. sults serve to motivate a focus on the role of social relationships in student behavior. Their analysis is complicated, however, by Yang et al. [36] also notes that unlike traditional courses stu- their reliance on self-report data which will skew the strength dents can join MOOCs at different times and observed that and recency of the reported relationships. students who join a course early are more likely to be active and connected in the forums, and less likely to drop out, than Fire et al. [11] studied student interaction in traditional class- those who join later. MOOCs also attract users with a range of rooms, constructing a social network based on cooperation on individual motivations. In a standard classroom setting students class assignments. Students were linked based on partnership on are constrained by availability, convention, and goals. Few stu- group work as well as inferred cooperation based on assignment dents enroll in a traditional course without seeking to complete submission times and IP addresses. The authors found that a it and to get formal credit for doing so. MOOCs by virtue of student’s grade was significantly correlated with the grade of their openness and flexibility attract a wide range of students the student with the strongest links to that student in the social with unique personal motivations [10]. Some join the course network. We perform similar analysis in this paper to examine with the intent of completing it. Others may seek only to brush whether the same correlation exists in MOOCs. up on existing knowledge, obtain specific skills, or just watch the videos. These distinct motivations in turn lend themselves Online student interaction in blended courses has also been to different in-class behaviors including assignment viewing and linked to course performance. Dawson [8] extracted student forum access. The impact of user motivations in online courses and instructor social networks from a blended course’s online has been previously discussed by Wang et al. [32, 33]; we will discussion forums and found that students in the 90th grade build upon that work here. Thus it is an open question whether percentile had larger social networks than those in the 10th these motivations affect students’ community behaviors or not. percentile. The study also found that high-performing students primarily associated with other high-performing students and were more likely to be connected to the course instructor, while low-performing students tended to associate with other low- performers. In a blended course, this effect may be offset by the same material as a graduate-level course, Core Methods face-to-face interaction not captured in the online social network, in Educational Data Mining, at Teachers College Columbia but if the same separation happens in MOOC communities, low- University. The MOOC spanned from October 24, 2013 to performing students are less likely to have other chances to learn December 26, 2013. The weekly course was composed of lecture from high-performing ones. videos and 8 weekly assignments. Most of the videos contained in-video quizzes (that did not count toward the final grade). 2.3 Community Detection One of the primary activities students engage in on forums All of the weekly assignments were structured as numeric input is question answering. Zhang et al. [38] conducted a social or multiple-choice questions. The assignments were graded au- network analysis on an online question-and-answer forum about tomatically. In each assignment, students were asked to conduct Java programming. Using vertex in-degree and out-degree, they analyses on a data set provided to them and answer questions were able to identify a relatively small number of active users about it. In order to receive a grade, students had to com- who answered many questions. This allowed the researchers to plete this assignment within two weeks of its release with up develop various algorithms for calculating a user’s Java expertise. to three attempts for each assignment, and the best score out Dedicated question-and-answer forums are more structured than of the three attempts was counted. The course had a total MOOC forums, with question and answer posts identified, but a enrollment of over 48,000, but a much smaller number actively similar approach might help identify which students in a MOOC participated. 13,314 students watched at least one video, 1,242 ask or answer the most questions. students watched all the videos, 1,380 students completed at least one assignment,and 778 made a post or comment in the Choo et al. [5] studied community detection in Amazon product- weekly discussion sections. Of those with posts, 426 completed review forums. Based on which users replied to each other most at least one class assignment. 638 students completed the online often, they found communities of book and movie reviewers who course and received a certificate (meaning that some students had similar tastes in these products. As in MOOC forums, users could earn a certificate without participating in forums at all). did not declare any explicit social relationships represented in the system, but they could still be grouped by implicit connections. In addition to the weekly assignments the students were sent a survey that was designed to assess their personal motivations In the context of complex networks, a community structure is a for enrolling in the course. This survey consisted of 3 sets subgraph which is more densely connected internally than it is to of questions: MOOC-specific motivational items; two PALS the rest of the network. We chose to apply the Girvan-Newman (Patterns of Adaptive Learning Survey) sub-scales [21], Aca- edge-betweenness algorithm (GN) [13]. This algorithm takes as demic Efficacy and Mastery-Goal Orientation; and an item input a weighted graph and a target number of communities. focused on confidence in course completion. It was distributed It then ranks the edges in the graph by their edge-betweenness to students through the course’s E-mail messaging system to value and removes the highest ranking edge. To calculate Edge- students who enrolled in the course prior to the official start betweenness we identify the shortest path p(a,b) between each date. Data on whether participants successfully completed the pair of nodes a and b in the graph. The edge-betweenness course was downloaded from the same course system after the of an arc is defined as the number of shortest paths that it course concluded. The survey received 2,792 responses; 38% of participates in. This is one of the centrality measures explored the participants were female and 62% of the participants were by Kovanovic et al. above [15]. The algorithm then recalcu- male. All of the respondents were over 18 years of age. lates the edge-betweenness values and iterates until the desired number of disjoint community subgraphs has been produced. The MOOC-specific items consisted of 10 questions drawn from Thus the algorithm operates by iteratively finding and removing previous MOOC research studies (cf. [2, 22]) asking respondents the highest-value communications channel between communities to rate their reasons for enrollment. These 10 items address until the graph is fully segmented. For this analysis, we used traits of MOOCs as a novel online learning platform. Specifically, the iGraph library [7] implementation of G-N within R [24]. these 10 items included questions on both the learning content and features of MOOCs as a new platform. Two PALS Survey The strength of a candidate community can be estimated by scales [21] measuring mastery-goal orientation and academic modularity. The modularity score of a given subgraph is defined efficacy were used to study standard motivational constructs. as a ratio of its intra-connectedness (edges within the subgraph) PALS scales have been widely used to investigate the relation to the inter-connectedness with the rest of the graph minus the between a learning environment and a student’s motivation (cf. fraction of such edges expected if they were distributed at ran- [6, 20, 28]). Altogether ten items with five under each scale dom [13, 35]. A graph with a high modularity score represents were included. The participants were asked to select a number a dense sub-community within the graph. from 1 to 5 with 1 meaning least relevant and 5 most relevant. Respondents were also asked to self-rate their confidence on a 3. DATA SET scale of 1 to 10 as to whether they could complete the course according to the pace set by the course instructor. All three This study used data collected from the “Big Data in Education” groups of items were domain-general. MOOC hosted on the Coursera platform as one of the inaugural courses offered by Columbia University [32]. It was created in response to the increasing interest in the learning sciences and 4. METHODS educational technology communities in using EDM methods For our analysis, we extracted a social network from the online with fine-grained log data. The overall goal of this course was forum associated with the course. We assigned a node to each to enable students to apply each method to answer education student, instructor, or TA in the course who added to it. Nodes research questions and to drive intervention and improvement in representing students were labeled with their final course grade educational software and systems. The course covered roughly out of 100 points. The Coursera forums operate as standard threaded forums. Course participants could start a new thread the network would only share edges with vertices of different with an initial post, add a post to an existing thread, and add scores. Thus grade assortativity allows us to measure whether a comment or child element below an existing post. We added individuals are not just connected directly to individuals with a directed edge from the author of each post or comment to the similar scores but whether they correlate with individuals who parent post and to all posts or comments that preceded it on are one step removed. the thread based upon their timestamp. We made a conscious decision to omit the textual content of the replies with the goal Several commonly studied classes of networks tend to have pat- of isolating the impact of the structure alone. terns in their assortativity. Social networks tend to have high assortativity, while biological and technological networks tend We thus treat each reply or followup in the graph as an implicit to have negative values (dissortativity) [23]. In a homogeneous social connection and thus a possible relationship. Such implicit course or one where students only form stratified communities social relationships have been explored in the context of recom- we would expect the assortativity to be very high while in a het- mender systems to detect strong communities of researchers [5]. erogeneous class with no distinct communities we would expect This is, by design, a permissive definition that is based upon it to be quite low. the assumption that individuals generally add to a thread after viewing the prior content within it and that individual threads 4.2 Community Detection can be treated as group conversations with each reply being a The process of community detection we employed is briefly de- conscious statement for everyone who has already spoken. The scribed here [3]. As noted there we elected to ignore the edge resulting network forms a multigraph with each edge represent- direction when making our graph. Our goal in doing so was to ing a single implicit social interaction. We removed self loops focus on communities of learners who shared the same threads, from this graph as they indicate general forum activity but even when they were not directly replying to one-another. We not any meaningful interaction with another person. We also believe this to be a reasonable assumption given the role of class removed vertices with a degree of 0, and collapsed the parallel forums as a knowledge-building environment in which students edges to form a simple weighted graph for analysis. exchange information with the group. Individuals who partic- ipate in a thread generally review prior posts before submitting In the analyses below we will focus on isolating student perfor- their contribution and are likely to return to view the followups. mance and assessing the impact of the faculty and hub students. Homogeneity in this context would mean that students gathered We will therefore consider four classes of graphs: ALL the com- and communicated primarily with equally-performing peers and plete graph; Student the graph with the instructor and TAs thus that they did not consistently draw from better-performing removed; NoHub the graph with the instructor and hub users re- classmates and help lower-performing ones or that the at-will moved; and Survey which includes only students who completed communities served to homogenize performance, with the stu- the motivation survey. We will also consider versions of the above dents in a given cluster evening out over time. graphs without students who obtained a score of 0, and without the isolated individuals who connect with at most one other While algorithms such as GN are useful for finding clusters they person. As we will discuss below, a number of students received do not, in and of themselves, determine the right number of a zero grade in the course. Because this is an at-will course, how- communities. Rather, when given a target number they will seek ever, we cannot readily determine why these scores were obtained. to identify the best possible set of communities. In some imple- They may reflect a lack of engagement with the course, differen- mentations the algorithm can be applied to iteratively select the tial motivations for taking the course, a desire to see the course maximum modularity value over a possible range. Determining materials without assignments, or genuinely poor performance. the correct number of communities to detect, however, is a non-trivial task especially in large and densely connected graphs 4.1 Best-Friend Regression & Assortativity where changes to smaller communities will have comparatively Fire et al. [11] applied a similar social network approach to small effects on the global modularity score. As a consequence traditional classrooms and found a correlation between a stu- we cannot simply optimize for the best modularity score as we dent’s most highly connected neighbor (”best friend”) and the would risk missing small but important communities [12]. student’s grade. The links in that graph included cooperation on assignments as well as partnership on group assignments. Therefore, rather than select the clusterings based solely on To examine whether the same correlation existed in a massive the highest modularity, we have opted to estimate the correct online course in which students were less likely to know each number of clusters visually. To that end we plotted a series of other beforehand and there were no group assignments, we modularity curves over the set of graphs. For each graph G we calculated each student’s best friend in the same manner and applied the GN algorithm iteratively to produce all clusters in performed a similar correlation. the range (2,|GN |). For each clustering, we then calculated the global modularity score. We examined the resulting scores to The simple best friends analysis gives a straightforward mech- identify a crest where the modularity gain leveled off or began to anism for correlating individual students. However it is also decrease thus indicating that future subdivisions added no mean- worthwhile to ask about students who are one-step removed ingful information or created schisms in existing high-quality from their peers. Therefore we will also calculate the grade communities. This is a necessarily heuristic process that is sim- assortativity (rG ) of the graphs. Assortativity describes the cor- ilar to the use of Scree plots in Exploratory Factor Analysis [4]. relation of values between vertices and their neighbors [23]. The We define the number identified as the natural cluster number. assortativity metric r ranges between -1 and 1, and is essentially the Pearson correlation between vertex and their neighbors [23]. 5. RESULTS AND DISCUSSION A network with r =1 would have each vertex only sharing edges Before removing self-loops and collapsing the edges, the network with vertices of the same score. Likewise, if r =−1 vertices in contained 754 nodes and 49,896 edges. The final social network contained 754 nodes and 17,004 edges. 751 of the participants were students, with 1 instructor and 2 TAs. One individual was incorrectly labeled as a student when they were acting as the Chief Community TA. Since this person’s posts clearly indicated that he or she was acting in a TA capacity with regard to the forums, we relabeled him/her as a TA. Of the 751 students 304 obtained a zero grade in the course leaving 447 nonzero students. 215 of the 751 students responded to the motivation survey. There were a total of 55,179 registered users, so the set of 754 forum participants is a small fraction of the entire course audi- ence. However, forum users are not necessarily those who will make an effort or succeed in the course. Forum users did not all participate in the course, and some students who participated in the course did not use the forums: 1,381 students in the course got a grade greater than 0, and 934 of those did not post or comment on the forums, while 304 of the 751 students who did Figure 1: Modularity for each number of clusters, participate in the forums received a grade of 0. Clearly students including students with zeros. who go to the trouble of posting forum content are in some respect making an effort in the course beyond those who don’t, but this does not necessarily correspond to course success. 5.1 Best-Friend Regression & Assortativity We followed Fire et al.’s methodology for identifying Best Friends in a weighted graph and calculated a simple linear regression over the pairs. This correlation did not include the instructor or TAs in the analysis. We calculated the correlation between the students’ grades to their best friends’ grades in the set using Spearman’s Rank Correlation Coefficient (ρ) [34]. The two vari- ables were strongly correlated, ρ(748)=0.44, p<0.001. However, the correlation was also affected by the dense clusters of students with 0 grades. After removing the 0 grade students we found an additional moderate correlation, ρ(444)=0.29, p<0.001. Thus the significant correlation between best-friend grade and grade holds over the transition from the traditional classroom to Figure 2: Modularity for each number of clusters, a MOOC. This suggests that students in a MOOC, excluding the excluding students with zeros. many who drop out or do not submit assignments, behave sim- ilarly to those in a traditional classroom in this respect. These results are also consistent with our calculations for assortativity. 5.2 Community Structure There we found a small assortative trend for the grades as shown The modularity curves for the graphs both with and without in Table 1. These values reflect that a student was frequently zero-score students are shown in Figures 1 and 2. We exam- communicating with students who in turn communicated with ined these plots to select the natural cluster numbers which are students at a similar performance level. This in turn supports our shown in Table 2. As the values illustrate the instructor, TAs, belief that homogeneous communities may be found. As Table and hub students have a disproportionate impact on the graph 1 also illustrates, the zero-score students contribute substan- structure. The largest hub student in our graph connects to tially to the assortativity correlation as well with the correlation 444 out of 447 students in the network. The graph with all dropping by as much as a third when they were removed. users had lower modularity and required more clusters than the graphs with only students or only non-hubs (see Table 2), with Table 2: Graph sizes and natural number of clusters for each graph. Table 1: The grade assortativity for each network. Users Zeros V E Clusters Users Zeros V E rG All Yes 754 17004 212 All Yes 754 17004 0.29 All No 447 5678 173 All No 447 5678 0.20 Students Yes 751 15989 184 Students Yes 751 15989 0.32 Students No 447 5678 169 Students No 447 5678 0.20 Non-Hub Yes 716 9441 79 Non-Hub Yes 716 9441 0.37 Non-Hub No 422 3119 52 Non-Hub No 422 3119 0.24 Survey Yes 215 1679 58 Figure 3: View of the student communities with edges of frequency <2 removed. The Student network with (left) and without (right) hub-students, with each vertex representing a student and grade represented as color. the non-hub graph having the highest modularity. This suggests that non-hub students formed more isolated communities, while Table 3: Grade statistics by community, selected teaching staff and hubs communicated across these communities to show examples of more and less homogeneous and connected them. communities. Members Average Grade Standard Deviation This largely consistent with the intent of the forums and the 118 21.62 36.58 active role played by the instructor and TAs in monitoring and 41 22.00 32.45 replying to all relevant posts in the forums. It is particularly in- 34 25.41 40.44 teresting how closely the curves for the ALL and Student graphs 31 56.13 47.69 mirror one another. This may indicate that the hub students are 20 49.05 45.64 also those that followed the instructor and TAs closely, thus giv- 16 12.44 31.13 ing them isomorphic relationships, or it may indicate that they 14 88.43 22.47 are more connected than even the instructors and thus came to 12 96.08 6.36 bind the forums together on their own. This impact is further 11 96.45 7.38 illustrated by the cluster plots shown in Figure 3. Here the ab- 4 3.00 6.00 sence of the hub students results in a noticeable thinning of the 4 8.50 9.81 graph which in turn highlights the frequency of communication 4 4.25 8.50 that can be attributed to this, comparatively small, group. 4 96.25 3.50 The difference between the full plots and those with zero values are also notable as the zero grade students were clearly a major standard deviation for a small selection of the communities in factor in community formation. A direct examination of the the ALL reply network including zero-grades, hub students, user graph showed that many of the zero students were only and teaching staff. Several of the communities, particularly connected to other zero students or were not connected at all. the larger ones, do show a blend of good and poor students, This is also highlighted in Figure 3. In both graphs the bulk of with a high standard deviation. However many if not most of the zero score students are clustered in a tight network of com- the communities are more homogeneous with good and poor munities on the left-hand side. That super-community consists students sharing a community with similarly-performing peers. primarily of zero score students communicating with other zero- These clusters have markedly lower standard deviation. score students, a structure we have nick-named the ‘deathball.’ An examination of the grade distribution for each of the clusters 5.3 Student Performance & Motivation showed that the scores within each cluster were non-normal. As the color coding in Figure 3 illustrates, the students did Therefore we opted to apply the Kruskal-Wallis (KW) test to cluster by performance. Table 3 shows the average grade and assess the correlation between cluster membership and perfor- We also found that community membership was not a significant Table 4: Kruskal-Wallis test of student grade by predictor of whether students would complete the motivation community, for each graph. survey or of students’ motivations. We were surprised by the Users Zeros Chi-Squared df p-value fact that even when we focused solely on individuals who had All Yes 349.0273 211 < 0.005 completed the survey, the students did not connect by stated All No 216.1534 172 < 0.02 goals. This suggests to us that the students are more likely Students Yes 202.0814 78 < 0.005 coalescing around the pragmatic needs of the class or conceptual Students No 80.93076 51 < 0.005 challenges rather than on the winding paths that brought them Non-Hub Yes 309.8525 183 < 0.005 there. One limitation of this work is that by relying on the Non-Hub No 218.9603 168 < 0.01 forum data we were focused solely on the comparatively small Survey Yes 99.99840 577 < 0.005 proportion of enrolled students (6%) who actively participated in the forums. This group is, by definition a smaller set of more actively-involved participants. mance. The KW test is a nonparametric rank-based analogue In addition to addressing our primary questions this study also to the common Analysis of Variance [17]. Here we tested grade raised a number of open issues for further exploration. Firstly, by community number with the community being treated as a this work focused solely on the final course structure, grades, and categorical variable. The results of this comparison are shown motivations. We have not yet addressed whether these commu- in Table 4. As that illustrates, cluster membership was a sig- nities are stable over time or how they might change as students nificant predictor of student performance for all of the graphs drop in our out. Secondly, while we ruled out motivations as a with the non-zero graphs having markedly lower p-values than basis for the community this work we were not able to identify those with zero students included. These results are consistent what mechanisms do support the communities. And finally this with our hypothesis that students would form clusters of equal- study raises the question of generality and whether or not these performers and we find that those results hold even when the results can be applied to MOOCs offered on different topics or highly-connected instructors, TAs and hub students are included. whether the results apply to traditional and blended courses. We performed a similar KW analysis for the questions on the In subsequent studies we plan to examine both the evolution of motivation survey and for a binary variable indicating whether the networks over time as well as additional demographic data or not the student completed the survey at all. For this analysis with the goal of assessing both the stability of these networks we evaluated the clusters on all of the graphs. We found no and the role of other potential latent factors. We will also significant relationship between the community structure on examine other potential clustering mechanisms that control for any of the graphs and the survey question results or the survey other user features such as frequency of involvement and thread completion variable. Thus while the clusters may be driven by structure. We also plan to examine other similar datasets to separate factors they are not reflected in the survey content. determine if these features transition across classes and class types. We believe that these results may change somewhat once 6. CONCLUSIONS AND FUTURE WORK students can coordinate face to face far more easily than online. Our goal in this paper was to expand upon our prior community detection work with the goal of aligning that work with prior research on peer impacts, notably the work of Fire et al. [11]. 7. ACKNOWLEDGMENTS We also sought to examine the impact of hub students and This work was supported by NSF grant #1418269: “Modeling student motivations on our prior results. Social Interaction & Performance in STEM Learning” Yoav Bergner, Ryan Baker, Danielle S. McNamara, & Tiffany Barnes To that end we performed a novel community clustering analysis Co-PIs. of student performance data and forum communications taken from a single well-structured MOOC. As part of this analysis we 8. REFERENCES described a novel heuristic method for selecting natural numbers [1] R. Azen and D. Budescu. The dominance of clusters, and replicated the results of prior studies of both analysis approach for comparing predictors in multiple immediate neighbors and second-order assortativity. regression. Psychological Methods, 8(2):129–48, 2003. [2] Y. Belanger and J. Thornton. Consistent with prior work, we found that students’ grades Bioelectricity: A quantitative approach Duke University’s were significantly correlated with their most closely associated first MOOC. Journal of Learning Analytics, 2013. peers in the new networks. We also found that this correlation [3] R. Brown, C. F. Lynch, M. Eagle, J. Albert, T. Barnes, extended out to their second-order neighborhood. This is consis- R. Baker, Y. Bergner, and D. McNamara. Good tent with our prior work showing that students form stable user communities and bad communities: Does membership communities that are homogeneous by performance. We found affect performance? In C. Romero and M. Pechenizkiy, that those results were stable even if instructors, hub players, editors, Proceedings of the 8th International students with 0 scores, and students who did not fill out the sur- Conference on Educational Data Mining, 2015. submitted. vey were removed from consideration. This suggests that either [4] R. B. Cattell. The scree test for the number of factors. the students are forming communities that are homogeneous or Multivariate Behavioral Research, 1(2):245–276, 1966. that the effect of those individual and network features on the communities and on performance is minimal. [5] E. Choo, T. Yu, M. Chi, and Y. Sun. Revealing and incorporating implicit communities to improve recommender systems. In M. Babaioff, V. Conitzer, and D. Easley, editors, ACM Conference on Economics and Computation, EC ’14, Stanford, structure, student motivation, and academic achievement. CA, USA, June 8-12, 2014, pages 489–506. ACM, 2014. Annual Review of Psychology, 57:487–503, 2006. [6] K. Clayton, F. Blumberg, [21] C. Midgley, M. L. and D. P. Auld. The relationship between motivation Maehr, L. Hruda, E. Anderinan, L. Anderman, and K. E. learning strategies and choice of environment whether Freeman. Manual for the Patterns of Adaptive Learning traditional or including an online component. British Scales (PALS). University of Michigan, Ann Arbor, 2000. Journal of Educational Technology, 41(3):349–364, 2010. [22] MOOC @ Edinburgh 2013. MOOC @ Edinburgh [7] G. Csardi and T. Nepusz. 2013 - report #1. Journal of Learning Analytics, 2013. The igraph software package for complex network [23] M. E. Newman. Assortative Mixing in Networks. research. InterJournal, Complex Systems:1695, 2006. Physical Review Letters, 89(20):208701, Oct. 2002. [8] S. Dawson. ’Seeing’ the learning [24] R Core Team. R: A Language community: An exploration of the development of a and Environment for Statistical Computing. R Foundation resource for monitoring online student networking. British for Statistical Computing, Vienna, Austria, 2012. Journal of Educational Technology, 41(5):736–752, 2010. [25] B. Rienties, P. Alcott, and D. Jindal-Snape. [9] J. Eckles and E. Stradley. A To let students self-select or not: That is the social network analysis of student retention using archival question for teachers of culturally diverse groups. Journal data. Social Psychology of Education, 15(2):165–180, 2012. of Studies in International Education, 18(1):64–83, 2014. [10] A. Fini. The technological [26] T. Rizzuto, J. LeDoux, and J. Hatala. It’s not just what you dimension of a massive open online course: The know, it’s who you know: Testing a model of the relative case of the CCK08 course tools. The International Review importance of social networks to academic performance. Of Research In Open And Distance Learning, 10(5), 2009. Social Psychology of Education, 12(2):175–189, 2009. [11] M. Fire, [27] C. P. Rosé, R. Carlson, D. Yang, M. Wen, L. Resnick, G. Katz, Y. Elovici, B. Shapira, and L. Rokach. Predicting P. Goldman, and J. Sherer. Social factors that contribute to student exam’s scores by analyzing social network data. In attrition in MOOCs. In Proc. of the first ACM conference Active Media Technology, pages 584–595. Springer, 2012. on Learning@ scale conference, pages 197–198. ACM, 2014. [12] S. Fortunato and M. Barthélemy. [28] A. M. Ryan and H. Patrick. The classroom Resolution limit in community detection. Proc. social environment and changes in adolescents’ motivation of the National Academy of Sciences, 104(1):36–41, 2007. and engagement during middle school. American [13] M. Girvan and M. E. J. Newman. Community structure Educational Research Journal, 38(2):437–460, 2001. in social and biological networks. Proc. of the National [29] D. Seaton, Y. Bergner, I. Chuang, P. Mitros, and Academy of Sciences, 99(12):7821–7826, June 2002. D. Pritchard. Who does what in a massive open online [14] J. Huang, A. Dasgupta, A. Ghosh, course? Communications of the ACM, 57(4):58–65, 2014. J. Manning, and M. Sanders. Superposter behavior [30] G. Stahl, T. Koschmann, in MOOC forums. In Proc. of the first ACM conference and D. Suthers. Computer-supported collaborative on Learning@ scale conference, pages 117–126. ACM, 2014. learning: An historical perspective. Cambridge [15] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. handbook of the learning sciences, 2006:409–426, 2006. What is the source of social capital? the association [31] L. Van Dijk, G. Van Der Berg, and H. Van Keulen. between social network position and social presence in Interactive lectures in engineering education. European communities of inquiry. In S. G. Santos and O. C. Santos, Journal of Engineering Education, 26(1):15–28, 2001. editors, Proc. of the Workshops held at Educational [32] Y. Wang and R. Baker. Content or platform: Data Mining 2014, co-located with 7th International Why do students complete MOOCs? MERLOT Journal Conference on Educational Data Mining (EDM of Online Learning and Teaching, 11(1):191–218, 2015. 2014), London, United Kingdom, July 4-7, 2014., volume [33] Y. Wang, L. Paquette, and R. Baker. 1183 of CEUR Workshop Proc. CEUR-WS.org, 2014. A longitudinal study on learner career advancement [16] A. D. I. Kramer, J. E. Guillory, in MOOCs. Journal of Learning Analytics, 1(3), 2014. and J. T. Hancock. Experimental evidence of massive-scale [34] Wikipedia. Spearman’s emotional contagion through social networks. Proc. of the rank correlation coefficient — Wikipedia, the free National Academy of Sciences, 111(24):8788–8790, 2014. encyclopedia, 2013. [Online; accessed 27-February-2013]. [17] W. H. Kruskal and W. A. Wallis. Use [35] Wikipedia. Modularity (networks) — Wikipedia, the free of ranks in one-criterion variance analysis. Journal of the encyclopedia, 2014. [Online; accessed 5-February-2015]. American statistical Association, 47(260):583–621, 1952. [36] D. Yang, T. Sinha, D. Adamson, and C. P. Rose. Turn on, [18] N. Li, H. Verma, A. Skevi, tune in, drop out: Anticipating student dropouts in massive G. Zufferey, J. Blom, and P. Dillenbourg. Watching open online courses. In Proc. of the 2013 NIPS Data-Driven MOOCs together: investigating co-located MOOC Education Workshop, volume 10, page 13, 2013. study groups. Distance Education, 35(2):217–233, 2014. [37] W. W. Zachary. An information [19] T. R. Liyanagunawardena, A. A. Adams, and S. A. flow model for conflict and fission in small groups. Williams. MOOCs: A systematic study of the published Journal of Anthropological Research, 33:452–473, 1977. literature 2008-2012. The International Review of Research [38] J. Zhang, M. S. Ackerman, and L. Adamic. in Open and Distributed Learning, 14(3):202–227, 2013. Expertise networks in online communities: structure and [20] J. L. Meece, algorithms. In Proc. of the 16th international conference E. M. Anderman, and L. H. Anderman. Classroom goal on World Wide Web, pages 221–230. ACM, 2007. Using the Hint Factory to Compare Model-Based Tutoring Systems Collin Lynch Thomas W. Price North Carolina State North Carolina State University University 890 Oval Drive 890 Oval Drive Raleigh, NC 27695 Raleigh, NC 27695 cflynch@ncsu.edu twprice@ncsu.edu Min Chi Tiffany Barnes North Carolina State North Carolina State University University 890 Oval Drive 890 Oval Drive Raleigh, NC 27695 Raleigh, NC 27695 mchi@ncsu.edu tmbarnes@ncsu.edu ABSTRACT Model-based tutoring systems are based upon classical ex- Model-based tutoring systems are driven by an abstract do- pert systems, which represent relevant domain knowledge main model and solver that is used for solution validation via static rule bases or sets of constraints [9]. These knowl- and student guidance. Such models are robust but costly edge bases are generally designed by domain experts or with to produce and are not always adaptive to specific students’ their active involvement. They are then paired with classical needs. Data-driven methods such as the Hint Factory are search algorithms or heuristic satisfaction algorithms to au- comparatively cheaper and can be used to generate indi- tomatically solve domain problems, identify errors in student vidualized hints without a complete domain model. In this solutions, and to provide pedagogical guidance. The goal of paper we explore the application of data-driven hint analy- the design process is to produce expert models that give the sis of the type used in the Hint Factory to existing model- same procedural advice as a human expert. Classical model- based systems. We present an analysis of two probability based tutors have been quite successful in field trials, with tutors Andes and Pyrenees. The former allows for flexible systems such as the ACT Programming Tutor helping stu- problem-solving while the latter scaffolds students’ solution dents achieve almost two standard deviations higher than path. We argue that the state-space analysis can be used to those receiving conventional instruction [5]. better understand students’ problem-solving strategies and can be used to highlight the impact of different design de- Data-driven hint generation methods such as those used in cisions. We also demonstrate the potential for data-driven Hint Factory [17] take a different approach. Rather than us- hint generation across systems. ing a strong domain model to generate a-priori advice, data- driven systems examine prior student solution attempts to identify likely paths and common errors. This prior data 1. INTRODUCTION can then be used to provide guidance by directing students Developers of model-based tutoring systems draw on do- towards successful paths and away from likely pitfalls. In main experts to develop ideal models for student guidance. contrast to the expert systems approach, these models are Studies of such systems have traditionally been focused on primarily guided not by what experts consider to be ideal their overall impact on students’ performance and not on the but by what students do. students’ user-system interaction. The Hint Factory, by con- trast, takes a data-driven approach to extract advice based Model-based systems such as Andes [18] are advantageous upon students’ problem solving paths. In this paper we will as they can provide appropriate procedural guidance to stu- apply the Hint Factory analytically to evaluate the impact dents at any point in the process. Such models can also be of user interface changes and solution constraints between designed to reinforce key meta-cognitive concepts and ex- two closely-related tutoring systems for probability. plicit solution strategies [4]. They can also scale up rapidly to include new problems or even new domain concepts which can be incorporated into the existing system and will be available to all future users. Rich domain models, however, are comparatively expensive to construct and require the long-term involvement of domain experts to design and eval- uate them. Data-driven methods for generating feedback, by contrast, require much lower initial investment and can readily adapt to individual student behaviors. Systems such as the Hint Factory are designed to extract solutions from prior student data, to evaluate the quality of those solutions, and to com- pile solution-specific hints [17]. While this avoids the need for a strong domain model, it is limited to the space of so- lutions explored by prior students. In order to incorporate new problems or concepts it is necessary to collect additional data. Additionally, such methods are not generally designed to incorporate or reinforce higher-level solution strategies. We believe that both of these approaches have inherent ad- vantages and are not necessarily mutually exclusive. Our goal in this paper is to explore what potential data-driven methods have to inform and augment model-based systems. We argue that data-driven methods can be used to: (1) evaluate the differences between closely-related systems; (2) assess the impact of specific design decisions made in those Figure 1: The Andes user interface showing the systems for user behaviors; and (3) evaluate the potential ap- problem statement window with workspace on the plication of data-driven hint generation across systems. To upper left hand side, the variable and equation win- that end we will survey relevant prior work on model-based dows on the right hand side, and the dialogue win- and data-driven tutoring. We will describe two closely- dow on the lower left. related tutoring systems and data collected from them. We will then present a series of analyses using state-based meth- ods and discuss the conclusions that we drew from them. is associated with a pre-compiled solution graph that defines the set of possible solutions and problem-solving steps. The 2. BACKGROUND system uses a principle-driven automated problem solver to compile these graphs and to identify the complete solution 2.1 Model-Based Tutoring paths. The solver is designed to implement the Target Vari- Model-based tutoring systems take a classical expert-systems able Strategy (TVS), a backward-chaining problem solving approach to tutoring. They are typically based upon a strategy that proceeds from a goal variable (in this case strong domain model composed of declarative rules and facts the answer to the problem) via principle applications to the representing domain principles and problem-solving actions given information. The TVS was designed with the help of coupled with an automatic problem solver. This knowledge domain experts and guides solvers to define basic solution base is used to structure domain knowledge, define individ- information (e.g. given variables) and then to proceed from ual problems, evaluate candidate solutions, and to provide the goal variable and use principles to define it in terms of student guidance. Novices typically interact with the system the given variables. through problem solving with the system providing solution validation, automatic feedback, pedagogical guidance, and Students working with Andes use a multi-modal user inter- additional problem-solving tasks. The Sherlock 2 system, for face to write equations, define variables and engage in other example, was designed to teach avionics technicians about atomic problem-solving steps. A screenshot of the Andes UI appropriate diagnostic procedures [11]. The system relies on can be seen in Figure 1. Andes allows students to solve prob- a domain model that represents the avionics devices being lems flexibly, completing steps in any order so long as they tested, the behavior of the test equipment, and rules about are valid [20]. A step is considered to be valid if it matches expert diagnostic methods. Sherlock 2 uses these models one or more entries in the saved solution paths and all nec- to pose dynamic challenges to problem solvers, to simulate essary prerequisites have been completed. Invalid steps are responses to their actions, and to provide solution guidance. marked in red, but no other immediate feedback is given. Andes does not force students to delete or fix incorrect en- Andes [19, 18, 20] and Pyrenees [4] are closely-related model- tries as they do not affect the solution process. In addi- driven ITSs in the domains of physics and probability. They tion to validating entries, the Andes system also uses the were originally developed at the University of Pittsburgh un- precompiled solution graphs to provide procedural guidance der the Direction of Dr. Kurt VanLehn. Like other model- (next-step-help). When students request help, the system based systems, they rely on a rule-based domain model and will map their work to the saved solution paths. It will then automatic problem solvers that treat the domain rules as select the most complete solution and prompt them to work problem-solving steps. They distinguish between higher- on the next available step. level domain concepts such as Bayes’ Rule, and atomic steps such as variable definitions. Principles are defined by a cen- One of the original goals of the Andes system was to de- tral equation (e.g. p(A|B) = (p(B|A) ∗ p(A))/p(B)) and velop a tutor that operated as an “intelligent worksheet.” encapsulate a set of atomic problem-solving steps such as The system was designed to give students the freedom to writing the equation and defining the variables within it. solve problems in any order and to apply their preferred solution strategy. The system extends this freedom by al- The systems are designed to function as homework-helpers, lowing invalid steps in an otherwise valid solution and by with students logging into the system and being assigned or allowing students to make additional correct steps that do selecting one of a set of predefined problems. Each problem not advance the solution state or are drawn from multiple state of a student’s partial solution at some point during the problem solving process, and each edge represents an action that takes the student from one state to another. A complete solution is represented as a path from the initial state to a goal state. Each state in the interaction network is assigned a weight via a value-iteration algorithm. A new student re- questing a hint is matched to a previously observed state and given context-sensitive advice. If, for example, the student is working on a problem that requires Bayes’ Rule and has al- ready defined p(A), p(B), and p(B|A) then the Hint Factory would first prompt them to consider defining p(A|B), then it would point them to Bayes Rule, before finally showing them the equation p(A|B) = (p(B|A) ∗ p(A))/p(B). These hints are incorporated into existing tutoring systems Figure 2: The Pyrenees user interface showing the in the form of a lookup table that provides state-specific problem statement at the top, the variable and equa- advice. When a user asks for help the tutor will match their tion lists on the left, and the tutor interaction win- current state to an index state in the lookup table and will dow with calculator on the lower right. prompt them to take the action that will lead them to the highest value neighboring state. If their current state is not found then the tutor will look for a known prior state or solution paths. This was motivated in part by a desire to will give up. The Hint Factory has been applied successfully make the system work in many different educational con- in a number of domains including logic proofs [17], data texts where instructors have their own preferred methods structures [8], and programming [15, 10, 13]. Researchers [20]. The designers of Andes also consciously chose only to have also explored other related methods for providing data- provide advice upon demand when the students would be driven hints. These include alternative state representations most willing to accept it. For the students however, par- [13], path construction algorithms [16, 14], and example- ticularly those with poor problem-solving skills, this passive based model-construction [12]. guidance and comparative freedom can be problematic as it does not force them to adhere to a strategy. The primary goal of the Hint Factory is to leverage prior data to provide optimal state-specific advice. By calculating This problem motivated the development of Pyrenees. Pyre- advice on a per-state basis, the system is able to adapt to nees, like Andes acts as a homework helper and supports stu- students’ specific needs by taking into account both their dents with on-demand procedural and remediation help. It current state and the paths that they can take to reach the uses an isomorphic domain model with the same principles, goal. As a consequence the authors of the Hint Factory basic steps, problems, and solution paths. Unlike Andes, argue that this advice is more likely to be in the students’ however, Pyrenees forces students to applying the target- Zone of Proximal Development and thus more responsive to variable-strategy during problem solving. It also requires their needs than a less-sensitive algorithm. them to repair incorrect entries immediately before moving on. Students are guided through the solution process with 3. METHODS a menu-driven interface, shown in Figure 2. At each step, In order to investigate the application of data-driven meth- the system asks students what they want to work on next ods to model-based tutoring systems, we collected data from and permits them to make any valid step that is consistent two studies conducted with Andes and Pyrenees in the do- with the TVS. Chi and VanLehn [3] conducted a study of the main of probability. We then transformed these datasets two systems and found that scaffolding the TVS in Pyrenees into interaction networks, consisting of states linked with helped to eliminate the gap between high and low learners. actions. We used this representation to perform a variety of This effect was observed both in the original domain where quantitative and qualitative analyses with the goal of eval- it was taught (in their case probability) and it transferred to uating the differences between the two systems and the im- a new domain (physics), where students used Andes alone. pact of the specific design decisions that were made in each. 2.2 Data-Extraction and 3.1 The Andes and Pyrenees Datasets Data-Driven Tutoring. The Andes dataset was drawn from an experiment con- One of the longstanding goals of educational data-miners is ducted at the University of Pittsburgh [3]. This study was to support the development of data-driven tutoring systems. designed to assess the differential impact of instruction in Such systems use past student data to structure pedagogical Andes and Pyrenees on students’ meta-cognitive and problem- and domain knowledge, administer conceptual and pedagog- solving skills. Participants in this study were college under- ical advice, or evaluate student performance and needs. A graduates who were required to have taken high-school level number of attempts have been made to address these goals. algebra and physics but not to have taken a course in prob- One of the most successful data-driven systems is the Hint ability or statistics. The participants were volunteers and Factory [1, 2, 17]. The Hint Factory takes an MDP-based were paid by time not performance. approach to hint generation. It takes as input a set of prior student logs for a given problem, represented as a network of Forty-four students completed the entire study. However interactions [6, 7]. Each vertex in this network represents the for the purposes of the present analysis, we drew on all 66 students who completed at least one problem in Andes- tions, this representation uniquely identifies any equation Probability. This is consistent with prior uses of the Hint for a given problem. Because we used the same state rep- Factory which draw from all students including those who resentation for both tutors, we were able to compare states did not complete the problem. The Pyrenees-Probability directly across tutors. logs from this study were not used due to problems with the data format that prevented us from completing our analysis. Additionally, we opted to ignore incorrect entries. Pyrenees From this dataset we drew 394 problem attempts covering prevents students from applying the principles of probabil- 11 problems. The average number of steps required to solve ity improperly and forces them to correct any mistakes made the problems was 17.6. For each problem we analyzed be- immediately therefore any errors in the student logs are im- tween 25 and 72 problem attempts, with an average of 35.8 mediately removed making the paths uninformative. Andes, attempts per problem. Some attempts were from the same by contrast, gives students free reign when writing equations student, with at most two successful attempts per student. and making other entries. This freedom resulted in syntac- Over all problems, 81.7% of the attempts were successful, tic errors and improper rule application errors arising in our with the remainder being incomplete attempts. dataset. The meaning of these invalid equations is inherently ambiguous and therefore difficult to incorporate into a state The Pyrenees dataset was drawn from a study of 137 stu- definition. However such errors are immediately flagged by dents conducted in the 200-level Discrete Mathematics course the system and may be ignored by the student without con- in the Department of Computer Science at North Carolina sequence as they do not affect the answer validity therefore State University. This study used the same probability text- they may be safely ignored as well. book and pre-training materials as those used in the Andes study. The students used Pyrenees as part of a homework An edge, or action, in our network represents the correct assignment, in which they completed 12 problems using the application of a rule or a correct variable definition and leads tutoring system. One of these problems was not represented to a transition from one state to another. For the present in the Andes dataset. We therefore excluded it from our dataset and state representation, the possible actions were analysis, leaving 11 shared problems. the definition or deletion of variables or equations. Each of these actions was possible in both tutors. Unlike the Andes students, however, the Pyrenees students were not always required to solve every problem. In this 4. ANALYSIS study the system was configured to randomly select some In order to develop a broader understanding of our datasets, problems or problem steps to present as worked examples we first visualized the interaction network for each problem rather than as steps to be completed. In order to ensure as a weighted, directed graph. We included attempts from that the results were equivalent we excluded the problem- both Andes and Pyrenees in the network, and weighted the level worked examples and any attempt with a step-level edges and verticies by the frequency with which it appeared worked example from our analysis. As a consequence, each in the logs. We annotated each state and edge with the problem included a different subset of these students. For weight contributed by each tutor. Two examples of these each problem we analyzed between 83 and 102 problem at- graphs are given in Figure 3. tempts, with an average of 90.8 attempts per problem. Some attempts were from the same student, with at most one suc- Throughout this section, we will use these graphs to address cessful attempt per student. Over all problems, 83.4% of the points we outlined at the end of Section 1. We begin the attempts were successful. with a case study from one problem and will explore the student problem solving strategies using our graph repre- 3.2 State and Action Representations sentation. We will then compare the Andes and Pyrenees In order to compare the data from both tutors, we repre- Systems with a variety of metrics based on this represen- sented each problem as an interaction network, a represen- tation. We will relate our observations back to the design tation used originally in the Hint Factory [7]. In the net- decisions of each system and identify evidence that may sup- work a vertex, or state, represents the sum total of a stu- port or question these decisions. Finally, we will show how dents’ current problem solving steps at a given time during the analysis methods associated with data-driven hint gen- a problem-solving attempt. Because Andes permits flexible eration can be used to validate some of these findings. step ordering while Pyrenees does not, we chose to represent the problem solving state st as the set of valid variables and 4.1 Case Study: Problem Ex242 equations defined by the student at time t. The graphical representation of a problem is very helpful for giving a high-level overview of a problem and performing A variable is a probabilistic expression, such as P (A ∪ B), qualitative analysis. Problem Ex242, shown in Figure 3, that the student has identified as important to solving the presents an interesting scenario for a number of reasons. The problem, for which the probability is known or sought. An problem was the 10th in a series of 12 practice problems, and equation represents the application of a principle of proba- asked the following: bility, which relates the values of defined variables, such as the Complement Theorem, P (A) + P (¬A) = 1. Because such equations can be written in many algebraically equiv- Events A, B and C are mutually exclusive and alent ways, we represent each equation as a 2-tuple, con- exhaustive events with p(A) = 0.2 and p(B) = sisting of the set of variables included in the equation (e.g. 0.3. For an event D, we know p(D|A) = 0.04, {P (A), P (¬A)}) and the principle being applied (e.g. Com- p(D|B) = 0.03, and p(C|D) = 0.3. Determine plement Theorem). Because we only represent valid equa- p(B|D). clude them as they represent a fair proportion of the Andes data. For instance, 62 of the 126 Andes states for Ex242 were singleton states. Interestingly, while there were small variations among their solutions, all of the Andes students choose to apply Bayes’ Rule rather than relying solely on the Conditional Probabil- ity Theorem as suggested by Pyrenees. This, coupled with the strong proportion of Pyrenees students who also chose the Bayes’ Rule solution, indicates that the solution offered by Pyrenees may be unintuitive for students, especially if they have recently learned Bayes’ Rule. Again, this can be interpreted as evidence that Pyrenees’ strong guidance did have an impact on students’ problem solving strategies, but it also raises concerns about how reasonable this guidance will appear to the students. Regardless of one’s interpreta- tion, an awareness of a trend like this can help inform the evolution of model-based tutors like Andes and Pyrenees. 4.2 Comparing datasets Figure 3: A graph representation of problems We now turn to qualitatively comparing the datasets. While Ex252a, left, and Ex242, right. States and edges are it is not a common practice to directly compare data from colored on a gradient from blue to yellow, indicating different tutors, we argue that it is appropriate, especially the number of students who reached that state in the in this context. In longstanding tutoring projects it is com- Andes and Pyrenees tutors respectively. Rounded mon for developers and researchers to make many substan- edges indicate that at least one student from both tive changes. The Andes system itself has undergone sub- tutors is present in a state. A green border indicates stantial interface changes over the course of its development a solution state, and a pink border indicates that a [20]. These changes can alter student behavior in substan- state is contained in the pedagogically “ideal” solu- tial ways, and it is important for researchers to consider tion. Edge thickness corresponds to the natural log how they affect not just learning outcomes but also problem of the number of problem attempts which included solving strategies, as was investigated by Chi et al. [4]. the given edge. In many respects the close relationship between Andes and Pyrenees makes them analogous to different versions of the The problem is notable in the Pyrenees dataset because it same tutor and the presence of an isomorphic knowledge was the only one in which the students were split almost base and problem set makes it possible for us to draw mean- evenly among two solution paths. For most of the problems ingful comparisons between students. In this section we will in the dataset the vast majority of students followed the inspect how the scaffolding design decisions made when con- optimal solution path with only a few finding alternatives. structing the tutors affected the problem solving strategies The ideal solution path, as suggested by Pyrenees’ domain exhibited by the students. model, employed repeated applications of the Conditional Probability Theorem: P (A ∩ B) = P (A|B)P (B), which the A visual inspection of the state graphs for each problem problem was designed to teach. The students had been pre- revealed significant portions of each graph were shared be- viously exposed to Bayes’ Theorem however, and over half tween the two datasets and portions that were represented of them chose to apply it instead. This allowed them to in only one of the two. Despite the fact that students using circumvent one variable definition and two applications of Andes were capable of reaching any of the states available the Conditional Probability Theorem, achieving a slightly to students in Pyrenees, many Pyrenees states were never shorter solution path. We make no argument here which discovered by Andes students. As noted in Section 4.1, this path the tutor should encourage students to take, but it is suggests that guidance from the Pyrenees tutor is successful worth noting that the Hint Factory, when trained on the in leading students down solution paths that they would not Pyrenees data for this problem, recommends the shorter, otherwise have discovered, possibly applying skills that they more popular path. would otherwise not have used. The Andes dataset gives us a very different set of insights To quantify these findings, we calculated the relative sim- into this problem. Because Andes lacks the strong pro- ilarity of students in each tutor. For a given problem, we cess scaffolding of Pyrenees, students were able to make a defined the state-similarity between datasets A and B as the wider variety of choices, leading to a graph with many more, probability that a randomly selected state from a student in less populous states. While almost every state and edge in A will be passed through by a randomly selected student the Pyrenees graph represents multiple students, the An- in B. Recall from Section 3.2 that our state representation des graph contains a number of paths, including solutions, allows us to directly compare states across tutors. By this that were reached by only one student. In some state-based definition, the self-similarity of a dataset is a measure of analyses the authors choose to omit these singleton states, how closely its students overlap each other while the cross- for instance when generating hints. We have chosen to in- similarity is a measure of how closely its students overlap States Andes Pyrenees Solution States Andes Pyrenees Solution Andes 0.551 (0.134) 0.494 (0.153) 0.478 (0.186) Andes 0.419 (0.150) 0.327 (0.139) 0.253 (0.172) Pyrenees 0.460 (0.141) 0.688 (0.106) 0.669 (0.146) Pyrenees 0.372 (0.193) 0.709 (0.145) 0.601 (0.214) Actions Andes Pyrenees Solution Actions Andes Pyrenees Solution Andes 0.878 (0.085) 0.874 (0.118) 0.851 (0.140) Andes 0.818 (0.151) 0.582 (0.168) 0.636 (0.205) Pyrenees 0.828 (0.117) 0.936 (0.021) 0.923 (0.036) Pyrenees 0.678 (0.212) 0.899 (0.038) 0.879 (0.055) Table 1: Pairwise similarity across tutors and the Table 2: Pairwise similarity across tutors and the ideal solution path. Similarities were calculated for ideal solution path calculated using a variable-free each problem, and each cell lists the mean (and stan- state representation. Rows and columns are the dard deviation) over all problems. The top half cov- same as in Table 1. ers the state similarity metrics while the bottom half of each table covers action similarity. Problem Andes Pyrenees ex132 26 (47.5%) 2 (98.7%) ex132a 13 (33.3%) 1 (100.0%) with the other dataset. Similarity measures for the datasets ex144 2 (96.6%) 1 (100.0%) can be found at the top of Table 1. ex152 11 (0.0%) 4 (0.0%) ex152a 8 (59.0%) 3 (97.4%) Predictably, both datasets have higher self-similarity than ex152b 12 (0.0%) 1 (100.0%) cross-similarity, with Pyrenees showing higher self-similarity ex212 8 (71.4%) 1 (100.0%) than Andes. This indicates that Pyrenees students chose ex242 9 (0.0%) 2 (49.38%) more homogeneous paths to the goal. This is reasonable ex252 7 (76.9%) 2 (98.4%) and consistent with the heavy scaffolding that is built into ex252a 4 (81.8%) 2 (98.6%) the system. It is important to note that our similarity met- exc137 19 (0.0%) 2 (98.75%) rics are not symmetric. The cross-similarity of Pyrenees with Andes is higher than the reverse. This indicates that Table 3: For each problem, the tables gives the num- the path taken by Pyrenees students are more likely to have ber of unique solution states represented in each tu- been observed by Andes students than vice-versa. This has tor’s dataset. Note that there may exist many solu- important implications for designers who are interested in tion paths which reach a given solution. The follow- collecting data from a system that is undergoing modifica- ing percent (in parentheses) represents the percent tions. If a system becomes increasingly scaffolded and re- solution paths that ended in the pedagogically ideal strictive over time, past data will remain more relevant than solution. in a system that is relaxed. In many ways this simply re- flects the intuition that allowing students to explore a state space more fully will produce more broadly useful data, and with the design goals of Pyrenees which was set up to guide restricting students will produce data that is more narrowly students along the otherwise unfamiliar path of the TVS. useful. Note that here we are only observing trends, and we make no claims of statistical significance. We also opted to examine the impact of the variable defini- tions on our evaluation. As noted above, variable definitions In our analysis, we found that, within both datasets, many are an atomic action. They do not depend upon any event solution paths or sub-paths differed only in the order that assertion and thus have no ordering constraints unlike the actions were performed. In our domain, many actions do principles. We did so with the hypothesis that this would not have ordering constraints. It is possible, for example, increase the similarity metrics for the datasets by eliminat- to define the variables A and B in either order, and the re- ing the least constrained decisions from consideration. Our sulting solution paths would deviate from one another. We results are shown in Table 2 below. Contrary to our ex- thus sought to determine how much of the observed differ- pectations, this actually reduced the similarity both within ence between our two datasets was due to these ordering and across the datasets, with the exception of Pyrenees’ self- effects. To that end we define the action-similarity between similarity. Thus the unconstrained variable definitions did datasets A and P as the probability that a randomly se- not substantially contribute to the dissimilarity. Rather, lected action performed by a student in A will have been most of the variation lay in the order of principle applica- performed by a by a randomly selected student in B. These tions. values are shown in the bottom of Table 1, and each of the trends observed for state-similarity hold, with predictably 4.3 Similarity to an “ideal" solution higher similarity values. We now turn to exploring how well the ideal solution was represented in the datasets. For both tutors the ideal solu- It is notable that the similarity between Pyrenees and Andes tion is the pedagogically-desirable path constructed via the is almost as high as Andes’ self-similarity, indicating that the TVS. Our measure of cross-similarity between two datasets actions taken by Pyrenees students are almost as likely to can also be applied between a single solution path and a be observed in Andes students as Pyrenees students. This dataset by treating the single solution as a set of one. We suggests that, for the most part, the Andes students per- can thus measure the average likelihood of an ideal solution formed a superset of the actions performed by the Pyrenees state appearing in a student’s solution from each dataset. students. Thus the impact of Pyrenees is most visible in the The results of this calculation are shown in tables 1 and 2, order of execution, not the actions chosen. This is consistent using both the state- and action-similarities explained ear- lier. Predictably, the solution has a high similarity with Pyrenees students, as these students are scaffolded tightly and offered few chances to deviate from the path. As Table 3 shows, Pyrenees students were funneled almost exclusively to the ideal solution on the majority of prob- lems, even if their paths to the solution were variable. We found only one problem, Ex152, where the Pyrenees stu- dents missed the ideal path. That was traced to a pro- gramming error that forced students along a similar path. Otherwise, there was only one problem, Ex242 (discussed in Section 4.1), where a meaningful percentage of students chose a different solution. The Andes students, by contrast, were much less likely to finish in the ideal solution state, but this was also problem-dependent. 4.4 Applications of the Hint Factory Finally, having shown that the datasets differ, and that these Figure 4: The four Cold Start curves, averaged differences are consistent with the differing design choices of across all problems. The x-axis shows the number of the two tutors, we sought to determine what effect those students used to train the model, and the y-access differences would have on data-driven hint generation. Our shows the percentage of a new student’s path that goal was to determine how applicable a hint model of the has available hints. The curve labeled “XvY” indi- type produced by the Hint Factory would be for one dataset cates training on the X dataset and selecting a new if it was trained on another. To that end we performed a student from the Y dataset (A = Andes; P = Pyre- modified version of the Cold Start Experiment [1], which nees). is designed to measure the number of state-specific hints that Hint Factory can provide given a randomly selected dataset. The Cold Start experiment functions like leave- one-out cross-validation for state-based hint generation. In system in the same domain. Clearly a substantive interface the original Cold Start experiment, one student was selected and scaffolding change of the type made in Pyrenees can at random and removed from the dataset, to represent a change the state space sufficiently that we cannot trivially “new” student using the tutor. Each remaining student in rely on our prior data. On the other hand, while the cross- the dataset was then added, one at a time, in a random application of data does have upper limits, those limits are order to the Hint Factory’s model. On each iteration, the comparatively high. Clearly data from a prior system can be model is updated and the percentage of states on the ‘new’ reused and can serve as a reliable baseline for novel system, student’s path for which a hint is available is calculated. with the caveat that additional exploratory data is required. This is repeated a desired number of times with new students to account for ordering effects. 5. DISCUSSION AND CONCLUSION For the present study we calculated cold-start curves for Our goal in this paper was to evaluate the application of both the Pyrenees and Andes datasets. We also calculated data-driven methods such as the Hint Factory to model- curves using the opposing dataset to illustrate the growth based tutoring systems. To that end we analyzed and com- rate for cross-tutor hints. For these modified curves we se- pared datasets collected from two closely-related tutoring lected the hint-generating students from the opposing dataset. systems: Andes and Pyrenees. Through our analysis we All four curves are shown in Figure 4. Here AvA and PvP sought to: (1) evaluate the differences between closely-related designate the within tutor curves for Andes and Pyrenees systems; (2) assess the impact of specific design decisions respectively while PvA and AvP designate the cross-tutor made in those systems for user behaviors; and (3) evalu- curves for hints from Pyrenees provided to Andes users and ate the potential application of data-driven hint generation vice-versa. Figure 4 represents an average over all problems, across systems. and therefore the x-axis extends only as far as the minimum number of students to complete a problem. As the curves We found that, while the systems shared isomorphic domain illustrate, the within-tutor curves reach high rates of cov- models, problems, and ideal solutions, the observed user be- erage relatively quickly with PvP reaching a plateau above haviors differed substantially. Students using the Andes sys- 95% after 21 students and AvA reaching 85%. tem explored the space more widely, were more prone to identify novel solutions, and rarely followed the ideal solu- The cross-tutor curves, by contrast, reach much lower lim- tion path. Students in Pyrenees, by contrast, were far more its. AvP reaches a plateau of over 75% coverage, while PvA homogeneous in their solution process and were limited in reaches a plateau of 60% coverage. This reflects the same the alternative routes they explored. Contrary to our expec- trends observed in Tables 1 and 2, where Andes better ex- tations, we found that this variation was not due to simple plains the Pyrenees data than vice-versa; however, neither ordering variations in the simplest of steps but of alterna- dataset completely covers the other. On the one hand this tive strategy selection for the higher-level domain principles. result is somewhat problematic as it indicates that prior data This is largely consistent with the design decisions that mo- has a limited threshold for novel tutors or novel versions of a tivated both systems and with the results of prior studies. We also found that the state-based hint generation method Building Expert Systems. Addison-Wesley Publishing used in the Hint Factory can be applied to the Andes and Company Inc., Reading, Massachusetts, U.S.A., 1983. Pyrenees data given a suitable state representation. For this [10] A. Hicks, B. Peddycord III, and T. Barnes. Building analysis we opted for a set-based representation given the Games to Learn from Their Players: Generating Hints absence of strong ordering constraints across the principles. in a Serious Game. In Intelligent Tutoring Systems We then completed a cold-start analysis to show that the (ITS), pages 312–317, 2014. cross-tutor data could be used to bootstrap the construction [11] S. Katz, A. Lesgold, E. Hughes, D. Peters, G. Eggan, of hints for a novel system but does not provide for complete M. Gordin, and L. Greenberg. Sherlock 2: An coverage. intelligent tutoring system built on the lrdc framework. In C. P. Bloom and R. B. Loftin, editors, Ultimately we believe that the techniques used for data- Facilitating the Development and Use of Interactive driven hint generation have direct application to model- Learning Environments, Computers, Cognition, and based systems. Data-driven analysis can be used to iden- Work, chapter 10, pages 227 – 258. Lawrence Erlbaum tify the behavioral differences between closely related sys- Associates, Mawah New Jersey, 1998. tems and, we would argue, changes from one version of a [12] R. Kumar, M. E. Roy, B. Roberts, and J. I. Makhoul. system to another. We also found that these changes can Toward automatically building tutor models using be connected to the specific design decisions made during multiple behavior demonstrations. In development. Further, we found that data-driven methods S. Trausan-Matu, K. E. Boyer, M. Crosby, and can be applied to model-based tutoring data to generate K. Panourgia, editors, Proceedings of the 12th state-based hints. We believe that hint information of this International Conference on Intelligent Tutoring type may be used to supplement the existing domain models Systems, LNCS 8474, pages 535 – 544. Springer to produce more user-adaptive systems. In future work we Verlag, 2014. plan to apply these analyses to other appropriate datasets [13] B. Peddycord III, A. Hicks, and T. Barnes. and to test the incorporation of state-driven hints or hint Generating Hints for Programming Problems Using refinement to existing domain models. Intermediate Output. In Proceedings of the 7th International Conference on Educational Data Mining 6. ACKNOWLEDGMENTS (EDM 2014), pages 92–98, 2014. Work supported by NSF Grant #1432156 “Educational Data [14] C. Piech, M. Sahami, J. Huang, and L. Guibas. Mining for Individualized Instruction in STEM Learning En- Autonomously Generating Hints by Inferring Problem vironments” Min Chi & Tiffany Barnes Co-PIs. Solving Policies. In Learning at Scale (LAS), 2015. [15] K. Rivers and K. Koedinger. Automatic generation of 7. REFERENCES programming feedback: A data-driven approach. In [1] T. Barnes and J. Stamper. Toward Automatic Hint The First Workshop on AI-supported Education for Generation for Logic Proof Tutoring Using Historical Computer Science (AIEDCS 2013), 2013. Student Data. In Intelligent Tutoring Systems (ITS), [16] K. Rivers and K. Koedinger. Automating Hint pages 373–382, 2008. Generation with Solution Space Path Construction. In [2] T. Barnes and J. C. Stamper. Automatic hint Intelligent Tutoring Systems (ITS), 2014. generation for logic proof tutoring using historical [17] J. C. Stamper, M. Eagle, T. Barnes, and M. J. Croy. data. Educational Technology & Society, 13(1):3–12, Experimental evaluation of automatic hint generation 2010. for a logic tutor. I. J. Artificial Intelligence in [3] M. Chi and K. VanLehn. Eliminating the gap between Education, 22(1-2):3–17, 2013. the high and low students through meta-cognitive [18] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro, strategy instruction. In Intelligent Tutoring Systems R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and (ITS), volume 5091, pages 603–613, 2008. M. Wintersgill. The andes physics tutoring system: [4] M. Chi and K. VanLehn. Meta-Cognitive Strategy Five years of evaluations. In C. Looi, G. I. McCalla, Instruction in Intelligent Tutoring Systems: How, B. Bredeweg, and J. Breuker, editors, Artificial When, and Why. Educational Technology & Society, Intelligence in Education - Supporting Learning 3(1):25–39, 2010. through Intelligent and Socially Informed Technology, [5] A. Corbett. Cognitive computer tutors: Solving the Proceedings of the 12th International Conference on two-sigma problem. In User Modeling 2001, pages Artificial Intelligence in Education, AIED 2005, July 137–147, 2001. 18-22, 2005, Amsterdam, The Netherlands, volume [6] M. Eagle and T. Barnes. Exploring Differences in 125 of Frontiers in Artificial Intelligence and Problem Solving with Data-Driven Approach Maps. In Applications, pages 678–685. IOS Press, 2005. Educational Data Mining (EDM), 2014. [19] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro, [7] M. Eagle and T. Barnes. Exploring Networks of R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and Problem-Solving Interactions. In Learning Analytics M. Wintersgill. The andes physics tutoring system: (LAK), 2015. Lessons learned. I. J. Artificial Intelligence in [8] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn Education, 15(3):147–204, 2005. from you, you learn from me: How to make iList learn [20] K. VanLehn and B. van de Sande. The Andes physics from students. In Artificial Intelligence in Education tutoring system: An experiment in freedom. Advances (AIED), 2009. in intelligent tutoring systems., pages 421–443, 2010. [9] F. Hayes-Roth, D. A. Waterman, and D. B. Lenat. An Exploration of Data-Driven Hint Generation in an Open-Ended Programming Problem Thomas W. Price Tiffany Barnes North Carolina State University North Carolina State University 890 Oval Drive 890 Oval Drive Raleigh, NC 27606 Raleigh, NC 27606 twprice@ncsu.edu tmbarnes@ncsu.edu ABSTRACT lems have a large, often infinite, space of possible states. Data-driven systems can provide automated feedback in the Even a relatively simple programming problem may have form of hints to students working in problem solving environ- many unique goal states, each with multiple possible solu- ments. Programming problems present a unique challenge tion paths, leaving little overlap among student solutions. to these systems, in part because of the many ways in which Despite this challenge, a number of attempts have been a single program can be written. This paper reviews current made to adapt the Hint Factory to programming problems [9, strategies for generating data-driven hints for programming 12, 16]. While these approaches have been generally success- problems and examines their applicability to larger, more ful, they are most effective on small, well structured pro- open-ended problems, with multiple, loosely ordered goals. gramming problems, where the state space cannot grow too We use this analysis to suggest directions for future work to large. This paper explores the opposite type of problem: one generate hints for these problems. that has a large state space, multiple loosely ordered goals, unstructured output and involves creative design. Each of 1. INTRODUCTION these attributes poses a challenge to current data-driven hint Adaptive feedback is one of the hallmarks of an Intelli- generation techniques, but they are also the attributes that gent Tutoring System. This feedback often takes the form make such problems interesting, useful and realistic. In this of hints, pointing a student to the next step in solving a paper, we will refer to these as open-ended programming problem. While hints can be authored by experts or gener- problems. We investigate the applicability of current tech- ated by a solver, more recent data-driven approaches have niques to these problems and suggest areas for future re- shown that this feedback can be automatically generated search. from previous students’ solutions to a problem. The Hint Factory [18] has successfully generated data-driven hints in a The primary contributions of this paper are 1) a review of number of problem solving domains, including logic proofs [2], current data-driven hint generation methods for program- linked list problems [5] and a programming game [12]. The ming problems, 2) an analysis of those methods’ applica- Hint Factory operates on a representation of a problem called bility to an open-ended problem and 3) a discussion of the an interaction network [4], a directed graph where each ver- challenges that need to be addressed before we can expect tex represents a student’s state at some point in the problem to generate hints for similar problems. solving process, and each edge represents a student’s action that alters that state. A solution is represented as a path 2. CURRENT APPROACHES from the initial state to a goal state. A student request- Current approaches to generating data-driven programming ing a hint is matched to a previously observed state and hints, including alternatives to the Hint Factory [10, 13, 17], directed on a path to a goal state. The Hint Factory takes can be broken down into three primary components: advantage of the intuition that students with the same ini- tial state and objective will follow similar paths, producing a well-connected interaction network. When this occurs, even 1. A representation of a student’s state and a method a relatively small sample of student solutions can be enough for determining when one state can be reached from to provide hints to the majority of new students [1]. another, meaning they are connected in the network 2. An algorithm that, given a student’s current state, con- While problems in many domains result in well-connected structs an optimal path to a goal state. Often we sim- networks, this is not always the case. Programming prob- plify this to the problem of picking the first state on that path 3. A method to present this path, or next state, to the student in the form of a hint For the purposes of this paper, we limit our discussion to the first step in this process. (For a good discussion of the second step, see an analysis by Piech et al. [13], compar- ing path selection algorithms.) While in some domains this first step is straightforward, in programming tasks, espe- inition. Hicks and Peddycord [7, 12] used the Hint Factory cially open-ended problems, it is likely the most challeng- to generate hints for a programming game called Bots, in ing. The simplest approach is to take periodic snapshots of which the player writes a program to direct a robot through a student’s code and treat these as states, connecting con- various tasks in a 3D level. They chose to represent the state secutive snapshots in the network. However, because two of a player’s program as the final state of the game world students’ programs are unlikely to match exactly, this ap- after the program was executed. They compared the avail- proach is likely to produce a very sparse, poorly connected ability of hints when using this “world state” model with a network, making it difficult to match new students to prior traditional “code state” model, and found that using world solution attempts. A variety of techniques have been pre- states significantly reduced the total number of states and sented to address this problem, which can be grouped into increased the availability of hints. The challenge with this three main strategies: canonicalization, connecting states approach, as noted by the authors, is the generation of ac- and alternative state definitions. tionable hints. A student may be more capable of making a specific change to her code than determining how to effect 2.1 Canonicalization a specific change in the code’s output. Canonicalization is the process of putting code states into a standardized form, often by removing semantically unim- 3. AN OPEN-ENDED PROBLEM portant information, so that trivial difference do not prevent The above techniques have all shown success on smaller, two states from matching. Rivers and Koedinger [15] present well-structured problems, with ample data. We want to in- a method for canonicalizing student code by first represent- vestigate their applicability to an open-ended problem, as ing it as an Abstract Syntax Tree (AST). Once in this form, described in Section 1, where this is not the case. The pur- they apply a number of functions to canonicalize the code, pose of this paper is not to create actionable hints, nor are including normalizing arithmetic and boolean operators, re- we attempting to show the failures of current methods by moving unreachable and unused code, and inlining helper applying them to an overly challenging task. Rather, our functions. After performing this canonicalization on a set purpose is exploratory, using a small dataset to identify ar- of introductory programming problems, they found that a eas of possible future work, and challenges of which to be median 70% of states had at least one match in the net- mindful when moving forward with hint generation research. work. Jin et al. [9] represent a program’s state as a Linkage Graph, where each vertex is a code statement, and each di- We collected data from a programming activity completed rected edge represents an ordering dependency, determined by 6th grade students in a STEM outreach program called by which variables are read and assigned to in each state- SPARCS [3]. The program, which meets for half-day ses- ment. This state representation allows the Hint Factory to sions approximately once a month during the school year, ignore statement orderings which are not important to the consists of lessons designed and taught by undergraduate execution of the program. Lazar and Bratko [10] use the and graduate students to promote technical literacy. The actual text of Prolog code to represent a student’s state, class consisted of 17 students, 12 male and 5 female. and then canonicalize the code by removing whitespace and normalizing variable names. The activity was a programming exercise based on an Hour of Code activity from the Beauty and Joy of Computing cur- 2.2 Connecting States riculum [6]. It was a tutorial designed to introduce novices Even with canonicalization, a student requesting a hint may to programming for the first time. The exercise had users not match any existing state in the network. In this case, we create a simple web-based game, similar to whack-a-mole, can look for a similar state and create a connection between in which players attempt to click on a sprite as it jumps them. Rivers and Koedinger [16] use normalized string edit around the screen to win points. The exercise was split into distance as a similarity metric between two program states. 9 objectives, with tutorial text at each stage. Students were They connect any two states in the network which have at not required to finish an objective before proceeding. A fin- least 90% similarity, even if no historical data connect these ished project required the use of various programming con- states. Additionally, they use a technique called path con- cepts, including events, loops, variables and conditionals. struction to generate new solution paths from a given state The students used a drag-and-drop, block-based program- to a nearby, unconnected goal state by searching for a series ming language called Tiled Grace [8], which is similar to of insertions, deletions and edits to their AST that will trans- Scratch [14]. The user writes a program by arranging code form it into the goal state [17]. They also use this method blocks, which correspond directly to constructs in the Grace to discover new goal states which may be closer to the stu- programming language. The editor also supports switching dent’s current state. Jin et al. [9] use a similar technique to textual coding, but this feature was disabled. A screen- to transform their Linkage Graphs to better match the cur- shot of the activity can be seen in Figure 1. rent state of a student when no direct matches can be found in the interaction network. Piech et al. [13] use path con- During the activity, the students were allowed to go through struction to interpolate between two consecutive states on the exercise at their own pace. If they had questions, the a solution path which differ by more than one edit. This students were allowed to ask for help from the student vol- is useful to smooth data when student code is recorded in unteers. Students were stopped after 45 minutes of work. shapshots that are too far apart. Snapshots of a student’s code were saved each time it was run and periodically throughout the session. Occasional 2.3 Alternate State Definitions technical issues did occur in both groups. One student had Another approach is to forego the traditional code-based severe technical issues, and this student’s data was not an- representation of a student’s state, and use an alternate def- alyzed (and is not reflected in the counts above). Students Raw Canonical Ordered Total States 2380 1781 1656 % Unique 97.5% 94.8% 92.8% Mean NU Count 3.44 3.95 2.82 Median NU Count 2 2 2 Mean % Path Unique 89.9% 83.0% 78.9% Standard Deviation (6.67) (10.5) (13.3) Table 1: Various measures of the sparseness of the interaction network for the raw, canonicalized, and ordered-canonicalized state representations. Mean and median NU counts refer to the number of stu- dents who reached each non-unique state. Figure 1: A screenshot of the Hour of Code activ- ent code states to be merged in the process. We therefore ity that students completed. Students received in- see this as an upper bound on the value of removing unim- structions on the left panel. The center panel was a portant orderings from a code state. We recomputed our work area, and students could drag blocks in from metrics for the ordered-canonicalized interaction network as the bottom-right panel. The top-right panel allowed well. The results can be seen in Table 1. Our later analyses students to test their games. use the unsorted, canonicalized code representation. These results indicate that canonicalization does little to re- produced on average 148.5 unique code states and accom- duce the sparsity of the state space, with students spending plished between 1 and 6 of the activity’s objectives, averag- most of their time in states that no other student has seen. ing 3.2 objectives per student. For comparison, recall Rivers and Koedinger found 70% of states in a simple programming problem had a match after 4. ANALYSIS canonicalization [15], though they were using a much larger We attempted to understand the applicability of each of the dataset. In our dataset, it is unlikely that we would be able techniques discussed in Section 2 to our dataset. However, to find a direct path from a new student’s state to a goal since the output of our program was a game that involved state in order to suggest a hint. nondeterminism, we felt it would be inappropriate to at- tempt to represent a program’s state as the result of its 4.2 Connecting States execution. We therefore focused on the first two strategies, To address this, we explored the feasibility of connecting a canonicalization and connecting states. new student’s state to a similar, existing state in the net- work. It is unclear how close two code states should be 4.1 Canonicalization before it is appropriate to connect them as in [16], or to Our initial representation of a student’s code state was a generate a path between them as in [17]. It certainly de- tree, where each code block was a node, and its children in- pends on the state representation and distance metric used. cluded any blocks that were nested inside of it. In this way, Rather than identifying a cutoff and measuring how often our representation was similar to Rivers and Koedinger’s these techniques could be applied, we chose to visualize the ASTs. To get a baseline for the sparsity of our dataset, distance between two students and make qualitative obser- we first analyzed the code states without performing any vations. Because our code states were already represented canonicalization. We calculated the total number of states as trees, we used Tree Edit Distance (TED) as a distance in the interaction network and the percentage which were metric. While Rivers and Koedinger reported better success only reached by one student. Of those states reached by with Levenshtein distance [16], we believe that TED is the multiple students, we calculated the mean and median num- most appropriate distance metric for block code, where tree ber of students who reached them. We also calculated the edit operations correspond directly to user actions. percentage of each student’s states that were unreached by any other student in the dataset. For each pair of students, A and B, we created an N by M distance matrix, D, where N is the number of states in A’s We then canonicalized the data by removing variable names solution path, and M is the number of states in B’s solution and the values of number and string literals. Our problem path. Di,j = d(Ai , Bj ), where d is the TED distance func- featured very few arithmetic or logical operators, and these tion, Ai is the ith state of A and Bj is the jth state of B. were generally not nested, so we did not normalize them, We used the RTED algorithm [11] to calculate the distance as suggested by Rivers and Koedinger [15]. We reran our function, putting a weight of 1.0 on insertions, deletions and analyses on the canonicalized data. To ensure that we had replacements. We also omitted any state which was identical effectively removed all unimportant ordering information, to its predecessor state. We normalized these values by di- we recursively sorted the children of each node in the tree. viding by the maximum value of Di,j , and plotted the result This effectively removed any ordering information from a as an image. Three such images can be seen in Figure 2. student’s code state, and kept only hierarchical information. This is somewhat more extreme than the Linkage Graphs of We also calculated the “path” through this matrix that passes Jin et al. [9], and it does allow two meaningfully differ- through the least total distance. This does not represent a Mean Median Max Farthest 1 0.25 (0.27) 0.15 (0.29) 0.76 (0.56) 2.23 (0.75) 2 4.88 (3.93) 4.95 (4.34) 9.18 (5.74) 12.73 (6.10) 4 4.92 (2.77) 4.83 (2.78) 10.11 (3.69) 14.67 (4.77) 5 7.79 (1.32) 7.75 (1.41) 13.17 (1.72) 18.17 (1.72) 6 7.49 (1.11) 7.76 (1.37) 13.17 (0.98) 18.67 (1.75) Table 2: For each objective, average distances (and standard deviations) of minimum-distance student pairs, using the mean, median and max metrics. For reference, the final column represents the av- erage maximum distance each student moved from the start state while completing the objective. A visual inspection of these matrices reveals that while many student pairs are quite divergent, some show a notable close- ness throughout the exercise. In order to quantify these results, we developed a set of distance metrics between stu- dents. First, the distance matrix and the minimum-distance “path” were calculated for the two students. The path is comprised of pairs of states, and for each pair, we recorded the tree edit distance between the states. From this list of distances, we calculated the mean, median and maximum distances between the two students. We looked at each ob- Figure 2: Three distance matrices, each comparing jective in the exercise, and isolated the relevant subpath of two students, where each pixel represents the TED each student who completed that objective. We paired each between two states. Lighter shades of gray indi- of these subpaths with the most similar subpath in the set, cate smaller distances. The green/yellow line shows using the mean, median and max distance metrics. Table 2 the path through the matrix with minimized total shows the average values of these minimized pairs of stu- distance, with yellow shades indicating smaller dis- dents, using each metric. Objective 3, and objectives 7-9 tances. Red crosses indicate where both students were omitted, as too few students completed them. met an objective. The top-left figure compares two students as they completed objectives 1-4 in the ex- ercise. In this example, the minimum-distance line 5. DISCUSSION crosses each objective. The top right figure also de- We have attempted to apply a meaningful canonicalization picts two students completing objective 4, but the to our state space, which did serve to reduce the number darker colors and straighter line indicate less align- of states by 30.4%. However, as seen in Table 1, even af- ment. The bottom figure depicts two students com- ter the strongest canonicalization, over 90% of the states in pleting objectives 1-6, with high alignment. the interaction network had only been reached by one stu- dent, with an average 78.9% of the states in each student’s solution path being unique to that student. It seems that path through the interaction network, but rather an align- our approach to canonicalization is insufficient to produce a ment between the states of student A and those of student meaningful reduction of the state space, though it is possible B. Each pixel of the line represents a pairing of a state a more stringent canonicalization would be more effective. from A with a state from B, such that these pairings are contiguous and represent the smallest total distance. While Connecting existing states seems to be a more promising ap- we do not suggest applying this directly as a strategy for proach, giving us the ability to link new states to previously hint generation, it serves an a useful visual indicator of the observed states, even when they do not match exactly. Our compatibility of two students for hinting purposes. distance matrices indicate that some students take parallel, or slowly diverging solution paths, which suggests that they An alternate approach would have been to pair each state in may be useful to each other in the context of hinting. As the interaction network with its closest pair from any other shown in Figure 2, students are often closest together when student, and use this as a measure of how sparse the network completing the same objective. This may seem self-evident, was. We chose to compare whole students, rather than indi- but it does indicate that our distance metric is meaning- vidual states, because we felt that the former could lead to ful. It is more difficult to put the actual TED values into strange hinting behavior. Imagine, for instance, that a stu- context. Students get, on average, farther away from their dent requests a hint, which initially points to a state from closest paired student as they complete more objectives, but student B, but at the very next step requests a hint that this average distance does not exceed 8 tree edits during the points instead to student C. Perhaps the attributes that first 6 objectives. To put that number into context, during make the student’s state similar to that of B are different this same time students do not, on average, get more than from those that make the state similar to C. The resulting 19 tree edits away from the start state. This suggest that hints would be at best confusing, and at worst conflicting. there is certainly hint-relevant knowledge in these paired stu- dents, but that it may be difficult to harness this knowledge [3] V. Cateté, K. Wassell, and T. Barnes. Use and to generate a hint. development of entertainment technologies in after school STEM program. In Proceedings of the 45th ACM technical symposium on Computer science 5.1 Limitations education, pages 163–168, 2014. It is important to note that this analysis is an exploratory [4] M. Eagle, M. Johnson, and T. Barnes. Interaction case study, and makes no strong claims, only observations. Networks: Generating High Level Hints Based on We studied data from only 17 novice programmers, and the Network Community Clustering. In International problem we analyzed was highly complex, involving multiple Educational Data Mining Society, pages 164–167, control structures, loosely ordered objectives, and unstruc- 2012. tured output. This makes the problem quite dissimilar from [5] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn previous problems that have been been studied in the con- from you, you learn from me: How to make iList learn text of hint generation, making it difficult to determine what from students. In Artificial Intelligence in Education observations should be generalized. (AIED), 2009. [6] D. Garcia, B. Harvey, L. Segars, and C. How. AP CS 5.2 Future Work Principles Pilot at University of California, Berkeley. Rivers and Koedinger [16], as well as Jin et al. [9] note ACM Inroads, 3(2), 2012. the limitations of their methods for larger problems, and [7] A. Hicks, B. Peddycord III, and T. Barnes. Building each suggest that breaking a problem down into subprob- Games to Learn from Their Players: Generating Hints lems would help to address this. Lazar and Bratko [10] at- in a Serious Game. In Intelligent Tutoring Systems tempted this in their Prolog tutor by constructing hints for (ITS), pages 312–317, 2014. individual lines of code, which were treated as independent [8] M. Homer and J. Noble. Combining Tiled and Textual subproblems. Similarly, we may be able to isolate the sub- Views of Code. In Proceedings of 2nd IEEE Working section of a student’s code that is currently relevant, and Conference on Software Visualization, 2014. treat this as an independent problem. The hierarchical na- [9] W. Jin, T. Barnes, and J. Stamper. Program ture of block-based coding environments lends itself to this representation for automatic hint generation for a practice, making it an appealing direction for future work. data-driven novice programming tutor. In Intelligent Tutoring Systems (ITS), 2012. Our work makes the simplifying assumption that unweighted [10] T. Lazar and I. Bratko. Data-Driven Program TED is a reliable distance metric for code, but future work Synthesis for Hint Generation in Programming Tutors. should investigate alternative metrics. This might include a In Intelligent Tutoring Systems (ITS). Springer, 2014. weighted TED metric, which assigns different costs to inser- [11] M. Pawlik and N. Augsten. RTED: a robust algorithm tions, deletions and replacements, or even to different types for the tree edit distance. Proceedings of the VLDB of nodes (e.g. deleting a for-loop node might cost more than Endowment, 5(4):334–345, 2011. a function call node). Regardless of the metric used, once [12] B. Peddycord III, A. Hicks, and T. Barnes. two proximate states are identified, it is still an open ques- Generating Hints for Programming Problems Using tion how this information can be best used for hint gener- Intermediate Output. In Proceedings of the 7th ation. It is possible to construct a path between the states International Conference on Educational Data Mining and direct a student along this path. However, future work (EDM 2014), pages 92–98, 2014. might also investigate how to extract hint-relevant informa- [13] C. Piech, M. Sahami, J. Huang, and L. Guibas. tion from one state and apply it to a similar state directly. Autonomously Generating Hints by Inferring Problem Solving Policies. In Learning at Scale (LAS), 2015. Because of the nature of our problem’s output, we did not [14] M. Resnick, J. Maloney, H. Andrés, N. Rusk, explore non-code-based state representations, as described E. Eastmond, K. Brennan, A. Millner, E. Rosenbaum, in Section 2.3. It would still be worth investigating how this J. Silver, B. Silverman, and Y. Kafai. Scratch: might be applied to open-ended problems. For instance, a programming for all. Communications of the ACM, code state could be represented as a boolean vector, indi- 52(11):60–67, 2009. cating whether the code has passed a series of Unit Tests, [15] K. Rivers and K. Koedinger. A canonicalizing model and hints could direct the student to the current flaw in for building programming tutors. In Intelligent their program. However, creating actionable hints from this Tutoring Systems (ITS), 2012. information would pose a significant challenge. [16] K. Rivers and K. Koedinger. Automatic generation of programming feedback: A data-driven approach. In 6. REFERENCES The First Workshop on AI-supported Education for [1] T. Barnes and J. Stamper. Toward Automatic Hint Computer Science (AIEDCS 2013), 2013. Generation for Logic Proof Tutoring Using Historical [17] K. Rivers and K. Koedinger. Automating Hint Student Data. In Intelligent Tutoring Systems (ITS), Generation with Solution Space Path Construction. In pages 373–382, 2008. Intelligent Tutoring Systems (ITS), pages 329–339, [2] T. Barnes, J. Stamper, L. Lehman, and M. Croy. A 2014. pilot study on logic proof tutoring using hints [18] J. Stamper, M. Eagle, T. Barnes, and M. Croy. generated from historical student data. In Proceedings Experimental evaluation of automatic hint generation of the 1st Annual International Conference on for a logic tutor. Artificial Intelligence in Education Educational Data Mining (EDM), pages 1–5, 2008. (AIED), 22(1):3–17, 2013. Studio: Ontology-Based Educational Self-Assessment Christian Weber Réka Vas Corvinno Technology Corvinus University Transfer Center of Budapest Budapest, Hungary Budapest, Hungary cweber@corvinno.com reka.vas@uni-corvinus.hu ABSTRACT cognitive behavior. In his discussion, eight out of nine knowledge Students, through all stages of education, grasp new knowledge types underline that knowledge in the scope of learning is in the context of knowledge memorized all through their previous interrelated and strongly associated with previous experiences education. To self-predict personal proficiency in education, self- [2]. As such, a supporting solution for self-assessment should assessment acts as an important learning feedback. The in-house grasp and formalize the knowledge to assess in the context of developed Studio suit for educational self-assessment enables to related knowledge. model the educational domain as an ontology-based knowledge The Studio suit for educational self-assessment, presented in this structure, connecting assessment questions and learning material paper, provides here a software solution for testing the personal to each element in the ontology. Self-assessment tests are then proficiency in the context of related knowledge. It enables to created by utilizing a sub-ontology, which frames a tailored model areas of education as a substantial source for assessment testing environment fitting to the targeted educational field. In and narrows the gap between a potentially flawed self-prediction this paper we give an overview of how the educational data is and the real proficiency, by offering an objective and adaptive modeled as a domain ontology and present the concepts of online knowledge-test. To follow the natural learning process different relations used in the Studio system. We will deduct how and enable an easy extension, the software embeds the assessed the presented self-assessment makes use of the knowledge knowledge into a network of contextual knowledge, which structure for online testing and how it adapts the test to the enables to adapt the assessment to the responses of the students. performance of the student. Further we highlight where potentials are for the next stages of development. This paper will give an overview of the Studio educational domain ontology and the aspects of the system supporting Keywords personalized self-assessment. Further it will highlight potentials for data mining on the gathered educational data with an outlook Education, adaptive test, self-assessment, educational ontology on the next stages of evaluation. 1. INTRODUCTION 2. THE STUDIO APPROACH FOR SELF- Students exploring new fields of education are always confronted ASSESSMENT with questions regarding their individual progress: how much do The basic concept of Studio is to model the focused education as they know after iterations of learning, in which directions should an interrelated knowledge structure, which divides the education they progress to fill the field most effectively, how to grasp the into sub-areas and knowledge items to know. The managed outline and details of the field and how much of their time structure formalizes the relation between knowledge areas as a should they invest in learning? Especially in higher education, learning context and models the requirements to master specific where learning becomes a self-moderated, personalized process, parts of the education. This structure is used to create and students are in need of continuous self-assessment to capture support knowledge tests for students. Through this combination their current state of proficiency. At the same time, the of assessment and knowledge structure, the student gains the unframed, informal self-prediction of students regarding their freedom to explore not only single knowledge items but the personal skills is often substantive and systematically flawed [1]. education in the context of related knowledge areas, while the Here a systematic and objective solution for self-assessment is embedded requirements are used to map the modeled knowledge substantial to prevent a wrong or biased self-evaluation and to against the expected educational outcome. support the self-prediction of the personal proficiency. The assessment-system is designed to be accompanied by phases Following Jonassen, knowledge in education could be split into of learning within the system, where the student gets access to nine types across three categories to capture the human’s learning material, based on and supported by the test feedback. This combined approach offers a unique self-assessment to the students, where the backing knowledge context is used to adapt the assessment in dependency of the test performance of the student. Before any regular examination students may use Studio to assess their knowledge on their own. It is the tutor’s responsibility to set the course of self-assessment test in Studio system by selecting knowledge areas and sub-knowledge areas aspects of education as the curriculum or aspects relevant for the which are relevant for the target education from the domain task of learning and course creation [8][9][10] or describe the ontology. Then the frame will be automatically completed with design, use and retrieval of learning materials till creating elements from the ontology which detail the selected knowledge courses [11], as well as directly the learner within the education areas and are modeled as required for this part of the education. [12]. As the system stores assessment questions for each knowledge Within the area of educational ontologies, domain ontologies element, Studio will then automatically prepare an assessment tend to model too specific details of the education, in an attempt test, based on the defined selection and the domain ontology. The to model the specific field as complete as possible. This enables resulting knowledge-test is then accessible as a self-assessment a comprehensive view on the field but it comes at the cost of test for the student, who explores the backed knowledge generality, with the potential to be inflexible to handle changes. structure, which pictures the expected learning outcome, in Other concepts model the education across different ontologies, cycles of testing, reflection and learning. The process of test matching concepts like the learner, the education and the course definition and assessment is shown in Figure 1, while the result description, introducing a broad horizon but with additional preparation for reflection and learning is discussed in section 2.5. overhead to combine modelled insights and reason on new instances. The appeal of the Studio educational ontology is the size and focus of the main classes and their relationships between each other. The knowledge to learn is the main connecting concept in the core of education. It enables a great flexibility to be resourceful for different education related questions. An example is here the business process management extension PROKEX, which maps process requirements against knowledge areas to create assessment test, reflecting the requirements of attached processes [13]. An important factor in learning is the distance between the expectation of the tutor and the learning performance of the student. Here a short cycle of repeated assessment and learning is a major factor for a better personal learning performance [14]. This aspect directly benefits from the focused concentration on knowledge-areas as the main exchange concept between students and tutors. As even further the close connections between learners and educators via direct tutoring is one major enabler for computer aided systems [15], each step towards a more direct interaction through focused concepts is an additional supporter. The class structure fuses the idea of interrelated knowledge with a model of the basic types of educational concepts, involved in situations of individual learning. Figure 2 visualizes the class concepts as knowledge elements, together with the relation types, used to model the dependencies between different aspects of knowledge and learning within the educational ontology. The Knowledge Area is the super-class and core-concept of the ontology. The ontology defines two qualities of main relations between knowledge areas: Knowledge areas could be a sub- knowledge area of other knowledge areas with the “has_sub- Figure 1: The overall design, assess and reflection cyle of the knowledge_area” relation or be required for another knowledge system. area with the “requires_knowledge_of” relation. A knowledge area may have multiple connected knowledge areas, linked as a 2.1 The Educational Domain Ontology requirement or sub-area. The “requires_knowledge_of” relation The Studio system is based on a predesigned educational defines that a node is required to complete the knowledge of a ontology, explained in detail by Vas in [3]. Domain ontology is a parent knowledge area. This strict concept models a requirement frequently used term in the field of semantic technologies and dependency between fields of knowledge in education and yields underlines the storage and conceptualization of domain the potential to assess perquisites of learning, analog to the basic knowledge and is often used in a number of projects and idea of perquisites within knowledge spaces, developed by solutions [4][5][6] and could address a variety of domains with Falmagne [16]. different characteristics in their creation, structure and granularity, depending on the aim and the modeling person [7]. A Education is a structured process which splits the knowledge to specialization in terms of the field is the educational domain learn into different sub-aspects of learning. Knowledge areas in ontology which is a domain ontology adapted to the area and the ontology are extended by an additional sub-layer of concepts of education. They could target to model different knowledge elements in order to effectively support educational and testing requirements. Figure 2 visualizes the sub-elements 2.3 Creating and Maintaining Tests and their relations. By splitting the assessed knowledge into sub- The creation and continues maintenance of the domain ontology concepts, the coherence and correlation of self-assessment is a task of ontology engineering. The ontology engineer (the questions could be expressed more efficiently and with the ontologist), creates, uses and evaluates the ontology [17], with a potential of a more detailed educational feedback. strong focus on maintaining the structure and content. Within Studio, this process is guided and supported by a specialized Has sub- Requires administration workflow and splits in three consecutive task Knowledge Area areas, in line with decreasing access rights: knowledge area knowledge of Ontology engineering (instance level): The creation and linking of instances of the existing knowledge-area Part of Part of classes into the overall domain ontology. Part of Test definition: Knowledge areas, which are relevant to a target self-assessment test, are selected and Refers grouped into specialized containers called Concept Basic Concept Example to Refers to Groups (CG). These concept groups are organized into a tree of groups, in line with the target of the assessment. The final tree in this regards captures a Refers to sub-ontology. Concept groups are internally organized based on the overall ontology and include all relations Pr Co em nc between knowledge elements, as defined within the ise lu s domain ontology. io n Theorem Refers to Question and learning material creation: Questions and learning materials alike are directly connected to single knowledge areas within the designed test frame and get imported, if already existing, from the domain Test Learning ontology. More questions and learning materials are Questions Material defined now, in line with the additional need of the Testbank targeted education and are available for future tests. The pre-developed structure of classes and relations is fixed as Figure 2: Model of the educational ontology. the central and integral design of the system. A view of the Theorems express in a condensed and structured way the system interface for administration is provided in Figure 3. The fundamental insights within knowledge areas. They fuse and left area shows the visualization of the current ontology section explain the basic concepts of the depicted knowledge and set in revision and the right area shows the question overview with them in relation to the environment of learning with examples. editing options. Tabs give access to additional editing views, Multiple theorems could be “part_of” a knowledge area. Each including the learning material management and interfaces to theorem may define multiple Basic Concepts as a “premise” or modify relations between nodes and node descriptions. “conclusion”, to structure how the parts of the knowledge area are related. Examples enhance this parts as a strong anchor for 2.4 Adaptive Self-Assessment self-assessment questions and “refer_to” the theorems and basic To prepare an online self-assessment test, the system has to load concepts as a “part_of” one or more knowledge areas. the relevant educational areas from the domain ontology and extract the questions and relations of the filtered knowledge 2.2 The Testbank areas. In order to connect the task of self-assessment with the model of The internal test algorithm makes use of two assumptions: the educational domain, the system integrates a repository of assessment questions. Each question addresses one element of Knowledge-area ordering: As the main knowledge the overall knowledge and is directly associated with one areas are connected through “requires_knowledge_of” knowledge area or knowledge element instance within the and “part_of” relations, every path, starting with the ontology. The domain ontology provides here the structure for the start-element, will develop on average from general online self-assessment while the repository of questions concepts to detailed concepts - given that the concept supplements the areas as a test bank. The target of the self- groups in the test definition are also selected and assessment is to continuously improve the personal knowledge ordered to lead from general to more detailed groups. within the assessed educational areas, by providing feedback on Knowledge evaluation dependency: If a person, the performance after each phase of testing. To do so, the Studio taking the test, fails on general concepts he or she will system includes Learning Material connected to the test bank and potentially also fail on more detailed concepts. Further, the knowledge areas, analog to the test questions. The learning if a high number of detailed concepts are failed, the material is organized into sections as a structured text with parent knowledge isn’t sufficiently covered and will be mixed media, as pictures and videos, and is based on a wiki- derived as failed, too. engine to maintain the content, including external links. Figure 3: The main ontology maintenance and administration interface, showing a part of the domain ontology. The filtering is done based on the selection of a tutor, acting as same level. If the learner’s answer is correct, the system will an expert for the target educational area. The tutor chooses activate the child elements of the current node and draw a related areas, which are then created as a Test Definition, random question from the first left child. containing Concept Groups, as described in section 2.3. The Based on the tree shaped knowledge structure, the assessment system then uses the test definition as a filtering list to extract now follows these steps to run the self-assessment, supported by knowledge areas. After the extraction, the structure is cached as the extracted knowledge structure: a directed graph, while the top element of the initial concept group is set as a start element. Beginning with the start-element, 1. Starting from the start-element, the test algorithm will the test will move then through the graph, while administering activate the child knowledge-areas of the start element. the questions connected to knowledge areas and knowledge 2. The algorithm now selects the first child-knowledge elements. area and draws a random question out of the pool of The loading of knowledge-elements follows three steps: available questions for this specific knowledge-element from the test bank. 1. Each type of relation between two knowledge-elements implements a direction for the connection. Assuming 3. If the learner fails the question, the algorithm will the system loads all relations, starting with the start- mark the element as failed and select the next element and ending on a knowledge-element, this knowledge area from the same level. If the learner’s creates a two level structure where the start-node is a answer is correct, the system will activate the child parent-element and all related, loaded elements are elements of the current node and trigger the process for child-elements, as seen below in Figure 4. each child-element. 2. The loading algorithm then selects one child-element and assumes it as a start-element and repeat the loading process of knowledge-elements. 3. When no knowledge-elements for a parent-element could be loaded, the sub-process stops. When all sub- processes have stopped, the knowledge structure is fully covered. The test algorithm will now activate the child knowledge areas of the start element and select the first knowledge area to the left and draw a random question from the selected knowledge area. If Figure 4: Excerpt from the sub-ontology visualization, with the learner fails the question, the algorithm will mark the the visible parent-child relationship, as used in the data- element as failed and selects the next knowledge area from the loading for preparing the self-assessment. An example question is shown below in Figure 5. Further JavaScript framework [18]. The visualization itself is a custom following the testing algorithm, the system dives down within the build, similar to the Ext JS graph function “Radar” and based on domain ontology and triggers questions depending on the the idea of Ka-Ping, Fisher, Dhamija and Hearst [19]. All views learner’s answers and the extracted model of the relevant are able to zoom in and out of the graph, move the current education. In this regards the Studio system adapts the test on the excerpt and offer a color code legend, explaining the meaning of fly to the performance of the learner. Correlating to the idea of the colored nodes. In comparison with state of the art, the adaptation, the learner will later gain access to learning material interface offers no special grouping or additional visualization for each mastered knowledge area. As the learner continues to features like coding information into the size of nodes. Each use the self assessment to evaluate the personal knowledge, he or interface offers an additional textual tree view to explore the she will thus explore different areas of the target education, knowledge-elements or concept groups in a hierarchical listing. following their individual pace of learning. This simple, straightforward approach for visualization correlates with the goal of a direct and easy to grasp feedback through interfaces which have a flat learning curve and enable to catch the functionality in a small amount of time. While this simple visualization is sufficient for the reasonable amount of knowledge-elments within the result view, this alone is not suitable for the domain ontology administration interface, Figure 5: Test interface with a random drawn test question. as seen in Figure 3. Here Studio realizes methodologies to filter and transform the data to visualize. To do so it makes use of two 2.5 Test Feedback and Result Visualization supporting mechanisms: An important aspect of the system is the test feedback and The maximum-level-selector defines the maximum evaluation interface. The educational feedback is one of the main level the system extracts from the domain ontology for enabler for the student to grasp the current state and extend of full screen visualization. the personal education. The domain ontology models the structure and the dependencies of the educational domain, and In combination with the maximum level, the ontologist the grouped test definition extracts the relevant knowledge for could select single elements within the domain the target area or education. As such, the visualization of the ontology. This triggers an on-demand re-extraction of ontology structure extracted for the test, together with the the visualized data, setting the selected knowledge- indication of correct and incorrect answers, represents a map of element as the centre element. The system then loads the knowledge of the learner. the connected nodes, based on their relations into the orientation circles till the maximum defined level is Throughout each view onto the ontology, the system uses the reached. More details about the transformation are in same basic visualization, making use of the Sencha Ext JS [19]. Figure 6: Result visualization as educational feedback for the learner. Together, this selection and transformation mechanism enables Each event stores information about the system in 7 the fluent navigation within the complete domain ontology dimensions, as described in Table 1 below: structure, while re-using the same visualization interface. Table 1: Event blueprint to store events concerning system Figure 6 shows the main view of the result interface. The left interaction. area shows the sub-ontology extracted for the test, while the Attribute Description colored nodes represent the answers to the administered questions. A red node visualizes wrong answers, while orange Event description code Which type of event and what nodes are rejected nodes with correct answers but with an factors are relevant. insufficient number of correctly answered child nodes, Location code On which part of the assessment- indicating a lack of the underlying knowledge. Green nodes process or interface the event has represent accepted nodes with correct answers and a sufficient occurred. amount of correctly answered questions for child nodes. Grey nodes are not administered nodes, which were not yet reached Session identifier Each access of the system is one by the learner, as higher order nodes had no adequate session for one user. acceptance. Numerical value storage Multi-purpose field, filled Even though the target of the system is not a strict evaluation in depending on the event type. number, the evaluation of the percentage of solved and String value storage Multi-purpose field, filled accepted knowledge elements helps the learner to track the depending on the event type. personal progress and could additionally be saved as a report for further consultation. Besides providing an overview of the Event-time The time of the start of the event. self-assessment result, the result interface gives access to the integrated learning material. For every passed node, the learner Item reference A unique reference code, can now open the correlated material and intensify the identifying the correlated item knowledge for successful tested areas. within the ontology. E.g. a Retaking the test in cycles of testing and learning, while question or a knowledge-element adapting the educational interaction, is the central concept of ID. the Studio approach for self-assessment. As a consequence the All events are stored in order of their occurrence, so if no system will not disclose the right answers to questions or explicit end event is defined, the next event for the same learning material for not yet administered knowledge areas, to session and user is acting as the implicit end date. Extending promote an individual reflection on the educational content the existing storage of information within Studio, the new outside of a flat memorization of content. logging system stores the additional events, as shown in Table 2 below: 3. SYSTEM EVALUATION Table 2: Assessment events and descriptions. The system has been used, extended and evaluated in a number of European and nationally funded research projects, including Event type Description applications in business process management and innovation- START_TEST Marks the start of a test. transfer [20], medical education [21] and job market competency matching [22]. END_TEST Marks the end of a test. Currently the system is being evaluated based on a running OPEN_WELCOME_LM The user opened the welcome study with 200 university students in the field of business page. informatics. The study will conclude on two current research OPEN_LM_BLOCK The student opened a learning streams which are improving the systems testing and analysis material block on the test capability. The first direction looks into potentials for the interface. integration of learning styles into adaptive learning systems to offer valuable advice and instructions to teachers and students OPEN_LM The student opened the learning [23]. Within the second direction the question is challenged on material tab on the test interface. how to adapt the presented self-assessment further towards the RATE_LM The student rated the learning performance of the students, based on extracting assessment material. paths from the knowledge structure [24]. CHECK_RESULT The student opened a result page. For each running test, Studio collects basic quantitative data about the number of assigned questions, how often tests are CONTINUE_TEST The student submitted an answer. taken and how many students open which test and when. This FINISH_TEST The test has been finished. is completed by qualitative measures, collecting which SUSPEND_TEST The user suspended the test. questions and knowledge elements the students passed or failed. To conclude further on the mechanisms and impacts of RESUME_TEST The user has restarted a previously Studio within the current study, a new logging system was suspended test. developed, collecting the interaction with the system and SELECT_TEST_ALGO- The algorithm used to actually test detailed information about the feedback as detailed events. RITHM the student is selected. TEST_ALGORITHM_- The behavior of the current test text,” Expert Systems with Applications, vol. 34, no. 2, pp. EVENT algorithm changes, e.g. entering 1474–1480, Feb. 2008. another stage of testing. [5] S.-H. Wu and W.-L. Hsu, “SOAT: a semi-automatic domain ontology acquisition tool from Chinese corpus,” in ASK_TESTQUESTION Sends out a test question to the user to answer. Proceedings of the 19th international conference on Computational linguistics-Volume 2, 2002, pp. 1–5. STUDIO_LOGOUT The user logs out of the Studio [6] M. Missikoff, R. Navigli, and P. Velardi, “The Usable system. Ontology: An Environment for Building and Assessing a To store the events, the system implements an additional Domain Ontology,” in The Semantic Web — ISWC 2002, I. logging database, splitting the concepts of the logging to a star- Horrocks and J. Hendler, Eds. Springer Berlin Heidelberg, schema for efficient extraction, transformation and loading. The 2002, pp. 39–53. logging system is modular and easy to extend with new [7] T. A. Gavrilova and I. A. Leshcheva, “Ontology design concepts and easy to attach to potential event positions within and individual cognitive peculiarities: A pilot study,” the Studio runtime. Together with the existing logging of the Expert Systems with Applications, vol. 42, no. 8, pp. assessment evaluation feedback, this new extension tracks the 3883–3892, May 2015. exploration of the sub-ontology within the assessment and [8] S. Sosnovsky and T. Gavrilova, “Development of enriches the feedback data with context information of the Educational Ontology for C-programming,” 2006. students behavior on the system. [9] V. Psyché, J. Bourdeau, R. Nkambou, and R. Mizoguchi, “Making learning design standards work with an ontology 4. NEXT STEPS of educational theories,” in 12th Artificial Intelligence in The domain ontology offers a functional and semantically rich Education (AIED2005), 2005, pp. 539–546. core for supporting learning and education. Yet not all the [10] T. Nodenot, C. Marquesuzaá, P. Laforcade, and C. semantic potentials are fully leveraged to support and test the Sallaberry, “Model Based Engineering of Learning learner’s progress. The “requires_knowledge_of” relation- Situations for Adaptive Web Based Educational Systems,” requirement is a potential start-concept to model sub-areas as in Proceedings of the 13th International World Wide Web groups which together compose the dependency. This could act Conference on Alternate Track Papers &Amp; Posters, as an additional input for the assessment, where the system New York, NY, USA, 2004, pp. 94–103. derives more complex decision how to further explore the [11] A. Bouzeghoub, C. Carpentier, B. Defude, and F. related parts of the structure [25]. This could also be visualized, Duitama, “A model of reusable educational components enabling the learner to grasp the personal knowledge as a for the generation of adaptive courses,” in Proc. First visible group of concepts. International Workshop on Semantic Web for Web-Based Besides giving colors to the different types of relations, the Learning in conjunction with CAISE, 2003, vol. 3. visualizing of edges between knowledge areas is yet unfiltered, [12] W. Chen and R. Mizoguchi, “Learner model ontology and offering no further support for navigation. A next stage of learner model agent,” Cognitive Support for Learning- implementation could be the introduction of a visual ordering Imagining the Unknown, pp. 189–200, 2004. and grouping of knowledge areas and relations. Underlying [13] G. Neusch and A. Gábor, “PROKEX – INTEGRATED relations of sub-nodes could be interpreted visually through the PLATFORM FOR PROCESS-BASED KNOWLEDGE thickness of relations between nodes, easing the perception of EXTRACTION,” ICERI2014 Proceedings, pp. 3972– complex parts of the domain ontology, especially within 3977, 2014. administration and maintenance tasks. [14] H. L. Roediger III, “Relativity of Remembering: Why the Laws of Memory Vanished,” Annual Review of The feedback of the current evaluation study of Studio will Psychology, vol. 59, no. 1, pp. 225–254, 2008. provide additional insights into the usage of the system by the [15] J. D. Fletcher, “Evidence for learning from technology- students. Based on this new data it is possible to mine profiles assisted instruction,” Technology applications in over time on the knowledge structure. One major application is education: A learning view, pp. 79–99, 2003. here the creation of behavior profiles, as proposed in [23]. [16] J.-C. Falmagne, M. Koppen, M. Villano, J.-P. Doignon, and L. Johannesen, “Introduction to knowledge spaces: 5. REFERENCES How to build, test, and search them,” Psychological [1] D. Dunning, C. Heath, and J. M. Suls, “Flawed self- Review, vol. 97(2), pp. 201–224, 1990. assessment implications for health, education, and the [17] F. Neuhaus, E. Florescu, A. Galton, M. Gruninger, N. workplace,” Psychological science in the public interest, Guarino, L. Obrst, A. Sanchez, A. Vizedom, P. Yim, and vol. 5, no. 3, pp. 69–106, 2004. B. Smith, “Creating the ontologists of the future,” Applied [2] D. Jonassen, “Reconciling a Human Cognitive Ontology, vol. 6, no. 1, pp. 91–98, 2011. Architecture,” in Constructivist Instruction: Success Or [18] “Ext JS API.” [Online]. Available: Failure?, S. Tobias and T. M. Duffy, Eds. Routledge, http://www.objis.com/formationextjs/lib/extjs- 2009, pp. 13–33. 4.0.0/docs/index.html. [Accessed: 04-May-2015]. [3] R. Vas, “Educational Ontology and Knowledge Testing,” [19] K.-P. Yee, D. Fisher, R. Dhamija, and M. Hearst, Electronic Journal of Knowledge Management, vol. 5, no. “Animated exploration of dynamic graphs with radial 1, pp. 123 – 130, 2007. layout,” in IEEE Symposium on Information [4] M. Y. Dahab, H. A. Hassan, and A. Rafea, “TextOntoEx: Visualization, 2001. INFOVIS 2001, 2001, pp. 43–50. Automatic ontology construction from natural English [20] M. Arru, “Application of Process Ontology to improve the [23] H. M. Truong, “Integrating learning styles and adaptive e- funding allocation process at the European Institute of learning system: Current developments, problems and Innovation and Technology.,” in 3rd International opportunities,” Computers in Human Behavior, 2015. Conference on Electronic Government and the [24] C. Weber, “Enabling a Context Aware Knowledge-Intense Information Systems Perspective (EGOVIS)., Munich, Computerized Adaptive Test through Complex Event Germany, 2014. Processing,” Journal of the Sientific and Educational on [21] M., M., Ansari, F., Dornhöfer Khobreh and M. Fathi, “An Forum on Business Information Systems, vol. 9, no. 9, pp. ontology-based Recommender System to Support Nursing 66–74, 2014. Education and Training,” in LWA 2013, 2013. [25] C. Weber and R. Vas, “Extending Computerized Adaptive [22] V. Castello, L. Mahajan, E. Flores, M. Gabor, G. Neusch, Testing to Multiple Objectives: Envisioned on a Case I. B. Szabó, J. G. Caballero, L. Vettraino, J. M. Luna, C. from the Health Care,” in Electronic Government and the Blackburn, and F. J. Ramos, “THE SKILL MATCH Information Systems Perspective, vol. 8650, A. Kő and E. CHALLENGE. EVIDENCES FROM THE SMART Francesconi, Eds. Springer International Publishing, 2014, PROJECT,” ICERI2014 Proceedings, pp. 1182–1189, pp. 148–162. 2014. Graph Analysis of Student Model Networks Julio Guerra Yun Huang School of Information Sciences Intelligent Systems Program University of Pittsburgh University of Pittsburgh Pittsburgh, PA, USA Pittsburgh, PA, USA jdg60@pitt.edu yuh43@pitt.edu Roya Hosseini Peter Brusilovsky Intelligent Systems Program School of Information Sciences University of Pittsburgh University of Pittsburgh Pittsburgh, PA, USA Pittsburgh, PA, USA roh38@pitt.edu peterb@pitt.edu ABSTRACT same for all students and at all times. In this work we ex- This paper explores the feasibility of a graph-based approach plore the idea that it might be beneficial for a student model to model student knowledge in the domain of programming. to include connections between domain KEs that represent The key idea of this approach is that programming concepts some aspects of individual student knowledge rather than are truly learned not in isolation, but rather in combina- domain knowledge. This idea is motivated by the recogni- tion with other concepts. Following this idea, we represent tion that the mastery in many domains is reached as the a student model as a graph where links are gradually added student practices connecting different KEs, i.e., each KE is when the student’s ability to work with connected pairs of practiced in conjunction with other KEs. To address this, we concepts in the same context is confirmed. We also hypothe- build a model represented as a network of KEs that get pro- size that with this graph-based approach a number of tradi- gressively connected as the student successfully works with tional graph metrics could be used to better measure student problems and assessment items containing multiple KEs. As knowledge than using more traditional scalar models of stu- the student succeeds in more diverse items mapped to dif- dent knowledge. To collect some early evidence in favor of ferent KEs, her model gets better connected. this idea, we used data from several classroom studies to correlate graph metrics with various performance and moti- To explore the value of this graph-based representation of vation metrics. student knowledge, we compute different graph metrics (e.g., density, diameter) for each student and analyze them in re- 1. INTRODUCTION lation to student performance metrics and attitudinal ori- Student modeling is widely used in adaptive educational sys- entations drawn from a motivational theory. This analysis tems and tutoring systems to keep track of student knowl- was performed using data collected from 3 cohorts of a Java edge, detect misconceptions, provide targeted support and programming course using the same system and the same give feedback to the student [2]. The most typical overlay content materials. In the remaining part of the paper, we student model dynamically represents the inferred knowl- describe related work, introduce and illustrate the suggested edge level of the student for each knowledge element (KE) approach, describe graph and performance metrics, and re- (also called knowledge component or KC) defined in a do- port the results of the correlation analysis. main model. These knowledge levels are computed as the student answers questions or solves problems that are mapped to the domain KEs. Student models are frequently built over 2. RELATED WORK networked domain models where KEs are connected by pre- Graph representation of student activity is not new. The requisite, is-a, and other ontological relationships that are 2014 version of the Graph-Based Educational Data Mining used to propagate the knowledge levels and produce a more Workshop 1 contains two broad types of related work: the accurate representation of the knowledge of the learner. Since analysis of the networking interaction among students, for these connections belong to domain models, they stay the example work on social capital [14] and social networking in MOOCs [3, 12]; and analyses of learning paths over graph representation of student traces while performing activities in the system [1, 5]. Our work fits in the second type since we model traces of each student interacting with the sys- tem. However, our approach is different as it attempts to combine an underlying conceptual model with the traces of the student learning. 1 http://ceur-ws.org/Vol-1183/gedm2014_proceedings. pdf A considerable amount of work focused on graph repre- sentation of domain models that serve as a basis for over- lay student models. The majority of this work focused on constructing the prerequisite relationships between domain knowledge components (concept, skills) [6, 13]. In this case links established between a pair of concepts represent prereq- uisite - outcome relationship. Another considerable stream of work explored the use of formal ontologies with such re- lationships as is-a and part-of for connecting domain knowl- edge components [7]. Ontological representation, in turn, relates to another stream of work that applies graph tech- niques to structural knowledge representation, for example by analyzing the network properties of ontologies [9]. Figure 1: Exercise jwhile1 The research on graph-based domain models also leads to a stream of work on using Bayesian networks to model the relationships between domain concepts for knowledge prop- path length, and average node weight can be good indicators agation in the process of student modeling [15, 4]. Yet, in of student knowledge compared to the amount of activities both cases mentioned above links between knowledge com- done or overall measures of assessment like success rate on ponents were not parts of individual student model, but ei- exercises. We further explore these graph metrics in relation ther parts of the domain model or student modeling pro- with motivational factors drawn from a learning motivation cess and thus remain the same for all students. In con- theory. trast, the approach suggested in this paper adds links be- tween knowledge components to individual student models 3.1 Domain Model and Content to express combinations of knowledge components that the Our content corpus is composed by a set of 112 interactive given student explored in a problem solving or assessment parameterized exercises (i.e., questions or problems) in the process. This approach is motivated by our belief that in domain of Java programming from our system QuizJet [11]. the programming domain, student knowledge is more effec- Parameterized exercises are generated from a template by tively modeled by capturing student progress when students substituting a parameter variable with a randomly generated needed to apply multiple concepts at the same time. value. As a result each exercise can be attempted multiple times. To answer the exercise the student has to mentally 3. THE APPROACH execute a fragment of Java code to determine the value of a The idea behind our approach is that knowledge is likely to specific variable or the content printed on a console. When be stronger for concepts which are practiced together with the student answers, the system evaluates the correctness, a larger variety of other concepts. We hypothesize, for ex- reports to the student whether the answer was correct or ample, that a student who solves exercises, in which the wrong, shows the correct response, and invites the student concept for-loop is used with post-incremental operator and to “try again”. As a result, students may still try the same post-decremental operator will have a better understanding exercises even after several correct attempts. An example of of for-loop than another student who practices (even the parameterized java exercise can be seen in Figure 1. same amount of times) the for loops concept in a more nar- row context, i.e., only with post-incremental operator. To In order to find the concepts inside all of the exercises, we represent our approach, for each student we build a graph- used a parser [10] that extracts concepts from the exercise’s based student model as a network of concepts where the template code, analyzes its abstract syntax tree (AST), and edges are created as the student succeeds in exercises con- maps the nodes of the AST (concepts extracted) to the nodes taining both concepts to be connected. The Domain Model in a Java ontology 2 . This ontology is a hierarchy of pro- defining the concept space and the mapping between the gramming concepts in the java domain and the parser uses concepts and programming exercises is explained in the next only the concepts in the leaf nodes of the hierarchy. section. The weight of the edges in the graph is computed as the overall success rate on exercises performed by the In total there are 138 concepts extracted and mapped to student which contain the pair of concepts. Pairs of con- QuizJet exercises. Examples of concepts are: Int Data Type, cepts that do not co-occur in exercises succeeded by the stu- Less Expression, Return Statement, For Statement, Subtract dent are not connected in her graph. In this representation, Expression, Constant, Constant Initialization Statement, If highly connected nodes are concepts successfully practiced Statement, Array Data Type, Constructor Definition, etc. with different other concepts. We also compute a measure We excluded 8 concepts which appear in all exercise tem- of weight for each node by taking the average weight among plates (for example “Class Definition” or “Public Class Spec- edges connecting the node. This measure of the success rate ifier” appear in the first line of all exercises). Each concept on concepts favors exercises that connect more concepts be- appears in one or more Java exercises. Each of the 112 ex- cause each exercise containing n concepts produce or affects ercises maps to 2 to 47 Java concepts. For example, the n(n − 1)/2 edges. For example, a success on an exercise hav- exercise “jwhile1”, shown in Figure 1, is mapped to 5 con- ing 10 concepts contributes to 45 edges, but a successful at- cepts: Int Data Type, Simple Assignment Expression, Less tempt to an exercise connecting 5 concepts only contributes Expression, While Statement, Post Increment Expression. to 10 edges. We hypothesize that in a graph built following 2 this approach, metrics like average degree, density, average http://www.sis.pitt.edu/~paws/ont/java.owl 3.2 Graph Metrics Table 1: Correlation between activity measures and grade To characterize the student knowledge graph we computed and between graph metrics and grade. * significance at 0.1, standard graph metrics listed below. ** significance at 0.05. Measure of Activity Corr. Coeff. Sig. (p) • Graph Density (density): the ratio of the number of Correct Attempts to Exercises .046 .553 edges and the number of possible edges. Distinct Correct Exercises .114 .147 • Graph Diameter (diameter): length of the longest Overall Success Rate .298 .000** shortest path between every pair of nodes. Avg. Success Rate on Concepts .188 .016** • Average Path Length (avg.path.len): average among the shortest paths between all pairs of nodes. Graph Metric Corr. Coeff. Sig. (p) • Average Degree (avg.degree): average among the Average Degree .150 .055* degree of all nodes in an undirected graph. Graph Density .102 .190 • Average Node Weight (avg.node.weight): the weight Graph Diameter .147 .081 of a node is the average of the weight of its edges. We Average Path Length .152 .052* then average the weights of all nodes in the graph. Average Node Weight .201 .010** 3.3 Measures of Activity To measure student activity so that it could be correlated 4. EXPERIMENTS AND RESULTS with the graph metrics we collected and calculated the fol- 4.1 Dataset lowing success measures: We collected student data over three terms of a Java Pro- gramming course using the system: Fall 2013, Spring 2014, • Correct Attempts to Exercises (correct.attempts): and Fall 2014. Since the system usage was not mandatory, total number of correct attempts to exercises. It in- we want to exclude students who just tried the system while cludes repetition of exercises as well. likely using other activities (not captured by the system) • Distinct Correct Exercises (dist.correct.attempts): for practicing Java programming. For this we looked at the number of distinct exercise attempted successfully. distribution of distinct exercises attempted and we exclude • Overall Success Rate (success.rate): the number all student below the 1st quartile (14.5 distinct exercises of correct attempts to exercises divided by the total attempted). This left 83 students for our analysis. In to- number of attempts. tal these students made 8,915 attempts to exercises. On • Average Success Rate on Concepts average, students have attempted about 55 (Standard Devi- (avg.concept.succ.rate): we compute the success rate ation=22) distinct exercises while performing an average of of each concept as the average success rate of the ex- 107 (SD=92) exercises attempts. On average, students have ercises containing the concept. Then we average this covered about 63 concepts with SD=25 (i.e., succeeded in at among all concepts in the domain model. least one exercise containing the concept), and have covered about 773 concept pairs with SD=772 (i.e., succeeded in at least one exercise covering the concept pair.) The average 3.4 Motivational Factors success rate ( #correct attempts ) across students is about 69% We use the revised Achievement-Goal Orientation question- #total attempts naire [8] which contains 12 questions in a 7-point Likert (SD=11%). scale. There are 3 questions for each of the 4 factors of the Achievement-Goal Orientation framework : Mastery - 4.2 Graph Metrics and Learning Approach, Mastery-Avoidance, Performance-Approach and We compare graph metrics (Avg. Degree, Graph Density, Performance-Avoidance. Mastery-Approach goal orien- Graph Diameter, Avg. Path Length and Avg Node Weight) tation relates to intrinsic motivation: “I want to learn this and measures of activity (Correct Attempts to Exercises, because it is interesting for me”, “I want to master this Distinct Correct Exercises, Overall Success Rate and Avg. subject”; Mastery-Avoidance relates to the attitude of Success Rate on Concepts) by computing the Kendall’s τB avoid to fail or avoid learning less than the minimum; Per- correlation of these metrics with respect to the students’ formance -Approach goal orientation stresses the idea of grade on the programming course. Results are displayed in having a good performance and relates well with social com- Table 1. parison: “I want to perform good in this subject”, “I want to be better than others here”; and Performance-Avoidance Surprisingly, the plain Overall Success Rate (which does oriented students avoid to get lower grades or avoid to per- not consider concepts disaggregation, nor graph informa- form worse than other students. The goal orientation of tion) is better correlated with course grade than any other a student helps to explain the behavior that the student measure. Students who succeed more frequently, get in gen- exposes when facing difficulty, but does not label the final eral better grades. Interestingly, both the Average Suc- achievement of the student. For example, if a student is cess Rate on Concepts and the Average Node Weight Mastery-Approach oriented, it does not necessarily mean are both significantly correlated with grade. This last mea- that the student reached the mastery level of the skill or sure uses the graph information and presents a slightly bet- knowledge. In our case, we believe the achievement-goal ori- ter correlation than the former, which does not consider the entation of the student can convey the tendency to pursue graph information. (or avoid) to solve more diverse (and more difficult) exer- cises, which contain more heterogeneous space of concepts, Among the other graph metrics, Average Degree and Av- thus contribute to form better connected graphs. erage Path Length are marginally correlated with course which extent the motivational profile of the student explains the graph’s shape. Step-wise regression models were used where the dependent variables are the graph metrics and the independent variables are the motivational factors. We found a significant model of the diameter of the graph (R2 = 0.161, F = 6.523, p = 0.006) with the factors Mastery- Avoidance (B = 0.952, p = 0.001) and Mastery-Approach (B = −0.938, p = 0.006). Note the negative coefficient for Mastery-Approach and the positive coefficient for Mastery- Avoidance. As the Achievement-Goal Orientation frame- Figure 2: Graph representation of two students. work suggests, Mastery-Approach oriented students are mo- tivated to learn more, tend to explore more content and do not give up easily when facing difficulties; Mastery-Avoidance Table 2: Graph metrics, measures of activity and motiva- students, in the other hand, do not cope well with diffi- tional scores of 2 students. culties and tend to give up. Then, a possible explanation Student A Student B of the results is that, in one hand, students with higher Graph Density 0.077 0.094 Mastery-Approach orientation are more likely to solve diffi- Graph Diameter 2.85 2.00 cult questions which connects more and more distant con- Avg. Path Length 1.77 1.78 cepts which decreases the graph diameter; and on the other Avg. Degree 8.64 10.55 hand, Mastery-Avoidance students avoid difficult exercises Avg. Node Weight 0.49 0.51 containing many concepts, thus making less connections and Correct Attempts 71 83 producing graphs with higher diameters. Correlations be- Dist. Correct Exercises 66 61 tween graph metrics and motivational factors confirmed the Overall Succ. Rate 0.82 0.76 relation between Mastery-Avoidance and Graph Diameter Avg. Succ.Rate on Concepts 0.50 0.53 (Kendall’s τB = 0.197, p = 0.030). Although these re- Mastery-Approach 0.83 0.78 sults are encouraging, they are not conclusive. For exam- Mastery-Avoidance 0.83 0.56 ple, Mastery-Approach students might just do more work, Performance-Approach 1.0 0.17 not necessarily targeting difficult questions. More analysis Performance-Avoidance 1.0 0.0 is needed to deeply explore these issues. Grade (%) 100 97 5. DISCUSSIONS AND CONCLUSIONS In this paper we proposed a novel approach to represent stu- grade (p values less than 0.1). Although this is a weak ev- dent model in the form of a dynamic graph of concepts that idence, we believe that we are in the good track. A higher become connected when the student succeed in assessment Average Degree means a better connected graph, thus it item containing a pair of concepts to be connected. The follows our idea that highly connected nodes signal more idea behind this approach is to strengthen the model for knowledge. Average Path Length is more difficult to in- those concepts that are applied in more different contexts, terpret. A higher Average Path Length means a less i.e., in assessment items containing other different concepts. connected graph (which contradicts our assumption), but We applied this approach to data of assessment items an- also, it can express students reaching more “rear” concepts swered by real students and analyzed the graph properties which appear in few more-difficult-exercises and generally comparing them to several performance measures such as have longer shortest paths. We think that further explo- course grade as well as motivational factors. Results showed ration of metrics among sub-graphs (e.g. a graph for an that this idea is potentially a good indicator of knowledge specific topic), and further refinement of the approach to of the students, but further refinement of the approach is build edges (e.g. connecting concepts that co-occur close to needed. We used several measures of the built graphs as each other in the exercise) could help to clarify these results descriptors of student knowledge level, and we found that a metric aggregating the success rates of the edges to the level Figure 2 shows the graphs of 2 students who have similar of concepts (nodes) is highly correlated to course grade, al- amount of distinct exercises solved correctly but present dif- though it does not beat the plain overall success rate of the ferent graph metrics and motivational profile. See metrics in student in assessment items. Table 2. Student B has more edges, lower diameter, higher density, higher degree, solved less questions more times. Stu- In the future work, we plan to repeat our analysis using dent A presents a less connected graph although she he/she more reliable approaches to construct the knowledge graph. solved more distinct questions (66 compared to 61 on Stu- One idea is to use rich information provided by the parser dent B). Student B has lower Mastery-Avoidance orientation (mapping between exercises and concepts) to ensure that score and lower Performance orientation scores than Student each new link connects concepts that interact considerably A, which could explain why Student B work result in a bet- in the program code. This could be done by controlling ter connected graph. Analyses of Motivational factors are the concepts proximity in the question code (e.g. only con- described in the following section. sider co-occurrence when concepts are close to each other in the parser tree.) Another approach to keep more reliable 4.3 Metrics and Motivation edges is to consider only a subset of important concepts We now explore the relationship between motivational fac- for each problem using feature selection techniques. Also tors and the graphs of the students. The idea is to see to we plan to perform analyses of sub-graphs targeting specific “zones” of knowledge. For example, a partial graph with [7] P. Dolog, N. Henze, W. Nejdl, and M. Sintek. The only concepts that belongs to a specific topic, or concepts personal reader: Personalizing and enriching learning that are prerequisites of a specific concept. Another inter- resources using semantic web technologies. In esting idea relates to recommendation of content: guide the P. De Bra and W. Nejdl, editors, Adaptive Hypermedia and Adaptive Web-Based Systems, volume 3137 of student to questions that will connect the isolated parts of Lecture Notes in Computer Science, pages 85–94. the knowledge graph or minimize the average path length of Springer Berlin Heidelberg, 2004. the graph. Along the same lines, the analysis of the graph [8] A. J. Elliot and K. Murayama. On the measurement shortest paths and overall connectivity can help in designing of achievement goals: Critique, illustration, and assessment items that better connect distant concepts. application. Journal of Educational Psychology, 100(3):613, 2008. [9] B. Hoser, A. Hotho, R. Jäschke, C. Schmitz, and 6. REFERENCES G. Stumme. Semantic network analysis of ontologies. [1] N. Belacel, G. Durand, and F. Laplante. A binary Springer, 2006. integer programming model for global optimization of [10] R. Hosseini and P. Brusilovsky. Javaparser: A learning path discovery. In Workshop on Graph-Based fine-grain concept indexing tool for java exercises. In Educational Data Mining. The First Workshop on AI-supported Education for [2] P. Brusilovsky and E. Millán. User models for Computer Science (AIEDCS 2013), pages 60–63, 2013. adaptive hypermedia and adaptive educational [11] I.-H. Hsiao, S. Sosnovsky, and P. Brusilovsky. Guiding systems. In P. Brusilovsky, A. Kobsa, and W. Nejdl, students to the right questions: adaptive navigation editors, The Adaptive Web, volume 4321 of Lecture support in an e-learning system for java programming. Notes in Computer Science, chapter 1, pages 3–53. Journal of Computer Assisted Learning, Springer Berlin Heidelberg, Berlin, Heidelberg, 2007. 26(4):270–283, 2010. [3] V. Cateté, D. Hicks, C. Lynch, and T. Barnes. [12] S. Jiang, S. M. Fitzhugh, and M. Warschauer. Social Snag’em: Graph data mining for a social networking positioning and performance in moocs. In Workshop game. In Workshop on Graph-Based Educational Data on Graph-Based Educational Data Mining, page 14. Mining, volume 6, page 10. [13] T. Käser, S. Klingler, A. G. Schwing, and M. Gross. [4] C. Conati, A. Gertner, and K. Vanlehn. Using Beyond knowledge tracing: Modeling skill topologies bayesian networks to manage uncertainty in student with bayesian networks. In Intelligent Tutoring modeling. User modeling and user-adapted interaction, Systems, pages 188–198. Springer, 2014. 12(4):371–417, 2002. [14] V. Kovanovic, S. Joksimovic, D. Gasevic, and [5] R. Dekel and Y. Gal. On-line plan recognition in M. Hatala. What is the source of social capital? the exploratory learning environments. In Workshop on association between social network position and social Graph-Based Educational Data Mining. presence in communities of inquiry. 2014. [6] M. C. Desmarais and M. Gagnon. Bayesian student [15] E. Millán, T. Loboda, and J. L. Pérez-de-la Cruz. models based on item to item knowledge structures. Bayesian networks for student model engineering. Springer, 2006. Computers & Education, 55(4):1663–1683, 2010. Graph-based Modelling of Students’ Interaction Data from Exploratory Learning Environments Alexandra Poulovassilis Sergio Gutierrez-Santos Manolis Mavrikis London Knowledge Lab London Knowledge Lab London Knowledge Lab Birkbeck, Univ. of London Birkbeck, Univ. of London UCL Institute of Education ap@dcs.bbk.ac.uk sergut@dcs.bbk.ac.uk m.mavrikis@lkl.ac.uk ABSTRACT learning of algebraic generalisation [15]; and the iTalk2Learn Students’ interaction data from learning environments has system that aims to support 8-10 year old students’ learning an inherent temporal dimension, with successive events be- of fractions [7]. Both systems provide students with math- ing related through the “next event” relationship. Exploratory ematical microworlds in which they undertake construction learning environments (ELEs), in particular, can generate tasks: in MiGen creating 2-dimensional tiled models using very large volumes of such data, making their interpretation a tool called eXpresser and in iTalk2learn creating fractions a challenging task. Using two mathematical microworlds using the FractionsLab tool. In eXpresser, tasks typically re- as exemplars, we illustrate how modelling students’ event- quire the construction of several models, moving from spe- based interaction data as a graph can open up new querying cific models involving specific numeric values to a general and analysis opportunities. We demonstrate the possibilities model involving the use of one or more variables; in parallel, that graph-based modelling can provide for querying and students are asked to formulate algebraic rules specifying analysing the data, enabling investigation of student-system the number of tiles of each colour that are needed to fully interactions and leading to the improvement of future ver- colour their models. In FractionsLab, tasks require the con- sions of the ELEs under investigation. struction, comparison and manipulation of fractions, and students are encouraged to talk aloud about aspects of their constructions, such as whether two fractions are equivalent. Keywords Exploratory Learning Environments, Interaction Data, Graph Both systems include intelligent components that provide Modelling different levels of feedback to students, ranging from un- solicited prompts and nudges, to low-interruption feedback 1. INTRODUCTION that students can choose to view if they wish. The aim Much recent research has focussed on Exploratory Learn- of this feedback is to balance students’ freedom to explore ing Environments (ELEs) which encourage students’ open- while at the same time providing sufficient support to en- ended interaction within a knowledge domain, coupled with sure that learning is being achieved [9]. The intelligent sup- intelligent techniques that aim to provide pedagogical sup- port is designed through detailed cognitive task analysis and port to ensure students’ productive interaction [9]. The data Wizard-of-Oz studies [13], and it relies on meaningful indica- gathered from students’ interactions with such ELEs pro- tors being detected as students are undertaking construction vides a rich source of information for both pedagogical and tasks. Examples of such indicators in MiGen are ‘student technical research, to help understand how students are us- has made a building block’ (part of a model), ‘student has ing the ELE and how the intelligent support that it provides unlocked a number’ (i.e. has created a variable), ‘student may be enhanced to better support students’ learning. has unlocked too many numbers for this task’; while ex- amples of such indicators in FractionsLab are ‘student has In this paper, we consider how modelling students’ event- created a fraction’, ‘student has changed a fraction’ (numer- based interaction data as a graph makes possible graph- ator or denominator), ‘student has released a fraction’ (i.e. based queries and analyses that can provide insights into has finished changing it). the ways that students are using the affordances of the sys- tem and the effects of system interventions on students’ be- Teacher Assistance tools can subscribe to receive real-time haviour. Our case studies are two intelligent ELEs: the information relating to occurrences of indicators for each MiGen system, that aims to foster 11-14 year old students’ student, and can present aspects of this information visu- ally to the teacher [8]. Indicators are either task independent (TI) or task dependent (TD). The former refer to aspects of the student’s interaction that are related to the microworld itself and do not depend on the specific task the student is working on, while the latter require knowledge of the task the student is working on, may relate to combinations of student actions, and their detection requires intelligent rea- soning to be applied (a mixture of case-based, rule-based and probablistic techniques). Detailed discussions of MiGen’s TI and TD indicators and how the latter are inferred may be Event EventType found in [8]. dateTime eventID taskID eventStatus occurrenceOf In this paper we explore how graph-based representation of constrID eventCat userID event-based interaction data arising from ELEs such as Mi- sessionID Gen and FractionsLab can aid in the querying and analysis of such data, with the aim of exploring both the behaviours next of the students in undertaking the exploratory learning tasks set and the effectiveness of the intelligent support being provided by the system to the students. Data relating to Figure 1: Core Graph Data Model learning environments has often been modelled as a graph in previous work, for example in [10] for providing support to moderators in e-discussion environments; in [16, 18] for There is a relationship ‘next’ linking an instance of Event supporting learning of argumentation; in [17] for modelling to the next Event that occurs for the same user, task and data and metadata relating to episodes of work and learning session. There is a relationship ‘occurrenceOf’ linking each in a lifelong learning setting; in [1] for learning path discov- instance of Event to an instance of the EventType class. ery as students “navigate” through learning objects; in [3] for recognising students’ activity planning in ELEs; and in [23] The instances of the EventType class include: startTask, for gaining better understanding of learners’ interactions and endTask, numberCreated, numberUnlocked, unlockedNum- ties in professional networks. berChanged, buildingBlockMade, correctModelRuleCreated, incorrectModelRuleCreated, interventionGenerated, interven- Previous work that is close to ours is the work on interac- tionShown, in the case of eXpresser (see [8] for the full list); tion networks and hint generation [6, 21, 20, 4, 5], in which and startTask, endTask, fractionCreated, fractionChanged, the graphs used consist of nodes representing states within a fractionReleased, inverventionShown, in the case of Frac- problem-solving space and edges representing students’ ac- tionsLab. tions in transitioning between states. This approach targets learning environments where students are required to select We see that instances of the EventType class have several and apply rules, and the interaction network aims to rep- attributes, including: resent concisely information relating to students’ problem- solving sequences in moving from state to state. Our focus here differs from this in that we are using graphs to model • eventID: a unique numerical identifier for each type of fine-grained event-based interaction data arising from ELEs. indicator; In our graphs, nodes are used to represent indicator occur- rences (i.e. events, not problem states) and edges between • eventStatus: this may be -1, 0 or 1, respectively stat- such nodes represent the “next event” relationship. Also, ing that an occurrence of this type of indicator shows rather than using the information derived from querying and that the student is making negative, neutral or posi- analysing this data to automatically generate hints, our fo- tive progress towards achieving the task goals; an addi- cus is on investigating how students are using the system tional status 2 is used for indicators relating to system and the effects of the system’s interventions in order to un- interventions; derstand how students interact with the ELEs and improve their future versions. • eventCat: the category into which this indicator type falls; for example, startTask and endTask are task- related indicators; interventionGenerated and inter- 2. GRAPH-BASED MODELLING ventionShown are system-related ones; numberCreated, Figure 1 illustrates our Graph Data Model for ELE interac- numberUnlocked, unlockedNumberChanged are number- tion data. We see two classes of nodes: Event — represent- related; and fractionCreated, fractionChanged, frac- ing indicator occurrences; and EventType — representing tionReleased are fraction-related. different indicator types. The instances of the Event class are occurrences of indicators that are detected or generated Figure 2 shows a fragment of MiGen interaction data con- by the system as each student undertakes a task. We see forming to this graph data model. Specifically, it relates to that instances of Event have several attributes: dateTime: the interactions of user 5 as he/she is working on task 2 the date and time of the indicator occurrence; userID: the during session 9. The user makes three constructions during student it relates to; sessionID: the class session that the this task (with constrIDs 1, 2 and 3). The start and end student was participating in at the time; taskID: the taskID of the task are delimited by an occurrence of the startTask that the student was working on; and constrID: the con- and endTask indicator type, respectively — events 23041 struction that the student was working on1 . and 33154. We see that the two events following 23041 re- late to an intervention being generated and being shown to 1 The model in Fig. 1 focusses on the interaction data. The the student (this is likely to be because the student was in- full data relating to ELEs such as eXpresser and Fraction- active for over a minute after starting the task); following sLab would also include classes relating to users, tasks, ses- which, the student creates a number — event 24115. sions and constructions; and attributes describing instances of these classes, such as a user’s name and year-group, a task’s name and description, a construction’s content and There are additional attributes relating to events, not shown description, and a session’s description and duration. here for simplicity, capturing values relating to the student’s 23041 344712 startTask startTask dateTime: occurrenceOf dateTime: occurrenceOf 20150331091524 eventID:0 20150215091741 eventID:0 taskID:2 eventStatus:0 taskID:56 eventStatus:0 eventCat:taskEv next eventCat:taskEv constrID:1 constrID:4 userID:5 userID:5 sessionID:9 sessionID:1 intervention- intervention- next Generated Shown ... fractionChanged fractionReleased 23921 344758 occurrenceOf eventID:6001 eventID:6002 occurrenceOf eventID:1002 eventID:1003 dateTime: eventStatus:2 eventStatus:2 dateTime: eventStatus:1 eventStatus:1 20150331091637 eventCat:systemEv eventCat:systemEv next 20150215091828 eventCat:sfractionEv eventCat:fractionEv taskID:2 taskID:56 constrID:1 constrID:4 userID:5 userID:5 sessionID:9 occurrenceOf numberCreated sessionID:1 occurrenceOf interventionShown eventID:1006 eventID:6002 next eventStatus:1 next occurrenceOf eventStatus:2 23923 occurrenceOf eventCat:numberEv 344759 eventCat:systemEv dateTime: 24115 dateTime: 344760 20150331091638 dateTime: 33154 20150215091828 dateTime: 344761 taskID:2 20150331091655 taskID:56 20150215091832 constrID:1 dateTime: occurrenceOf constrID:4 dateTime: taskID:2 20150331094453 taskID:56 20150215091833 occurrenceOf userID:5 constrID:1 userID:5 constrID:4 sessionID:9 next userID:5 ... taskID:2 endTask sessionID:1 next userID:5 taskID:56 clickButton constrID:3 eventID:9999 constrID:4 eventID:3002 sessionID:9 next userID:5 sessionID:1 next userID:5 eventStatus:0 eventStatus:0 sessionID:9 eventCat:taskEv sessionID:1 eventCat:taskEv Figure 2: Fragment of Graph Data Figure 3: Fragment of Graph Data of a node would be represented by an edge and its value by constructions and information relating to the system’s in- a literal-valued node. So, for example, the information that terventions. For example, for event 24115, the value of the the taskID of event 23041 is 2 would be represented by an number created, say 5; for event 23921, the feedback strat- taskID edge 23041 −−−−−→ 2. The query examples in the next section egy used by the system to generate this intervention, say assume this “classical” graph representation. strategy 8; and for event 23923, the content of the message displayed to the user, say “How many green tiles do you need to make your pattern?” and whether this is a high-level in- 3. GRAPH QUERIES AND ANALYSES terruption by the system or a low-level interruption that Because the sub-graph induced by edges labelled ‘next’ con- the student can choose to view or not. Such information sists of a set of paths, the data readily lends itself to explo- can be captured through additional edges outgoing from an ration using conjunctive regular path (CRP) queries [2]. A value event instance to a literal-valued node: 24115 −−−→ 5, 23921 CRP query, Q, consisting of n conjuncts is of the form strategy message −−−−−−→ 8, 23932 −−−−−− → “How many green tiles do you need (Z1 , . . . , Zm ) ← (X1 , R1 , Y1 ), . . . , (Xn , Rn , Yn ) level to make your pattern?”, 23932 −−−→ “high”. Since graph data where each Xi and Yi is a variable or a constant, each Zi is models are semi-structured (and graph data therefore does a variable that appears also in the right hand side of Q, and not need to strictly conform to a single schema), this kind each Ri is a regular expression over the set of edge labels. of heterogeneity in the data is readily accommodated. In this context, a regular expression, R, has the following syntax: Figure 3 similarly shows a fragment of FractionsLab inter- action data, relating to the interactions of user 5 working R :=  | a | | (R1 .R2 ) | (R1 |R2 ) | R∗ | R+ on task 56 during session 1. The user makes one construc- where  denotes the empty string, a denotes an edge label, tion during this task. We see events relating to the stu- denotes the disjunction of all edge labels, and the operators dent changing and ‘releasing’ a fraction. Following which have their usual meaning. The answer to a CRP query on a the system displays a message (in this case, it was a high- graph G is obtained by finding for each 1 ≤ i ≤ n a binary interruption message of encouragement “Great! Well Done”). relation ri over the scheme (Xi , Yi ), where there is a tuple (x, y) in ri if and only if there is a path from x to y in G We see from Figures 2 and 3 that the sub-graph induced by such that: x = Xi if Xi is a constant; y = Yi if Yi is a edges labelled ‘next’ consists of a set of paths, one path for constant; and the concatenation of the edge labels in the each task undertaken by a specific user in a specific session. path satisfies the regular expression Ri . The answer is then The entire graph is a DAG (directed acyclic graph): there given by forming the natural join of the binary relations are no cycles induced by the edges labelled ‘next’ since each r1 , . . . , rn and finally projecting on Z1 , . . . , Zm . links an earlier indicator occurrence to a later one; while the instances of EventType and other literal-valued nodes To illustrate, the following CRP query returns pairs of events can have only incoming edges. The entire graph is also a x, y such that x is an intervention message shown to the user bipartite graph, with the two parts comprising (i) the in- by the system and y indicates that the user’s next action – stances of Event, and (ii) the instances of EventType and in eXpresser – was to create a number (note, variables in the literal-valued nodes. queries are distinguished by an initial question mark): As a final observation, we note that Figures 1 – 3 adopt a “property graph” notation (e.g. as used in the Neo4J (?X,?Y) <- (?X,occurrenceOf,interventionShown), graph database, neo4j.com) in which nodes may have at- (?X,next,?Y), tributes. In a “classical” graph data model, each attribute (?Y,occurrenceOf,numberCreated) The result would contain pairs such as (23923,24115) from (?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated), Figure 2, demonstrating that there are indeed situations (?X,constrID,?C), (?X,next+,?Y), where an intervention message displayed by the MiGen sys- (?Y,constrlID,?C), (?Y,occurrenceOf,?Z) tem leads directly to the creation of a number by the student. The following query returns pairs of events x, y such that The result would contain triples such as that x is an intervention message shown to the user by the (23921, 23923, interventionShown), system and y is the user’s next action; the type of y is also (23921, 24115, numberCreated), returned, through the variable ?Z: (23921, 24136, numberUnlocked), (23921, 24189, unlockedNumberChanged), relating to construction 1 made by user 5 during session 9 (?X,?Y,?Z) <- (?X,occurrenceOf,interventionShown), for task 2 (two more events — 24136 and 24189 — relat- (?X,next,?Y), ing to construction 1 have been assumed here, in addition (?Y,occurrenceOf,?Z) to 23923 amd 24115 shown in Figure 2, for illustrative pur- poses). The results would not contain (23921,33154,end- The result would contain triples such as (23923,24115,num- Task), since event 33154 relates to construction 3. berCreated) from Figure 2 and (344760,344761,clickButton) from Figure 3, allowing researchers to see what types of To show more clearly the answers to the previous query in events directly follow the display of an intervention mes- the form of possible event paths, we can use extended regular sage. This would allow the confirmation or contradiction of path (ERP) queries [11], in which a regular expression can researchers’ expectations regarding the immediate effect of be associated with a path variable and path variables can intervention messages on students’ behaviours. appear in the left-hand-side of queries. Thus, for example, the following query returns the possible paths from x to y: Focussing for the rest of this section on the data in Figure 2, the following query returns pairs of events x, y such that x is (?X,?P,?Y,?Z) <- any type of event and y indicates that the user’s next action (?X,occurrenceOf,interventionGenerated), was to unlock a number; the type of x is also returned, (?X,constrID,?C), (?X,next+:?P,?Y), through the variable ?Z: (?Y,constrID,?C), (?Y,occurrenceOf,?Z) (?X,?Y,?Z) <- (?X,occurrenceOf,?Z), (?X,next,?Y), The result would contain answers such as (?Y,occurrenceOf,numberUnlocked) (23921, [next], 23923, interventionShown), (23921, [next, 23923, next], 24115, numberCreated), (23921, [next, 23923, next, 24115, next], 24136, numberUn- The result would allow researchers to see what types of locked), events immediately precede the unlocking of a number (i.e. (23921, [next, 23923, next, 24115, next, 24136, next], 24189, the creation of a variable). This would allow confirmation unlockedNumberChanged). of researchers’ expectations about the design of the MiGen system’s intelligent support in guiding students towards gen- The use of the regular expressions next and next+ in the eralising their models by changing a fixed number to an ‘un- previous queries matches precisely one edge labelled ‘next’, locked’ one. or any number of such edges (greater than or equal to 1), respectively. However, for finer control and ranking of query The following query returns pairs of events x, y such that answers, it is possible to use approximate answering of CRP that x is an intervention generated by the system and y is and ERP queries (see [11, 17]), in which edit operations such any subsequent event linked to x through a path comprising as insertion, deletion or substitution of an edge label can be one or more ‘next’ edges; the type of y is also returned, applied to regular expressions. through the variable ?Z: For example, using the techniques described in [11, 17], the (?X,?Y,?Z) <- (?X,occurrenceOf,interventionGenerated), user can chose to allow the insertion of the label ‘next’ into (?X,next+,?Y), a regular expression, at an edit cost of 1. Submitting then (?Y,occurrenceOf,?Z) this query: The result would contain triples such as (23921, 23923, inter- (?X,?P,?Y,?Z) <- ventionShown), (23921, 24115, numberCreated), ... (23921, (?X,occurrenceOf,interventionGenerated), 33154, endTask), allowing researchers to see what types of (?X,constrID,?C), APPROX(?X,next:?P,?Y), events directly or indirectly follow the display of an interven- (?Y,constrID,?C), (?Y,occurrenceOf,?Z) tion message by the system. This would allow the confirma- tion or contradiction of researchers’ expectations regarding the longer-term effect of intervention messages on students’ would return first exact answers, such as behaviours. (23921, [next], 23923, interventionShown). The regular ex- pression next in the conjunct APPROX(?X,next:?P,?Y) would We can modify the query to retain only pairs x, y that relate then be automatically approximated to next.next, leading to the same construction: to answers such as Transition Matrix(Session 3) (23921, [next, 23923, next], 24115, numberCreated) 6003 e s 6002 1001 at an edit distance of 1 from the original query. Following 6001 1002 this, the regular expression next.next would be automati- 5004 1003 cally approximated to next.next.next, leading to answers 5003 1004 such as 5002 1005 (23921, [next, 23923, next, 24115, next], 24136, numberUn- locked) 5001 1006 at distance 2. This incremental return of paths of increas- 3009 1007 ing length can continue for as long as the user wishes, and 3008 1008 allows researchers to examine increasingly longer-term ef- 3007 1009 fects of intervention messages on students’ behaviours. It 3006 1010 3002 1011 would also be possible for users to specify from the outset a 1015 1014 minimum and maximum edit distance to be used in approx- 0.96342 0.00024384 imating and evaluating the query, for example to request paths encompassing between 2 and 4 edges labelled ‘next’. Figure 4: Transitions between Event Types Queries based on evaluating regular expressions over a graph- based representation of interaction data, such as those above, can aid in the exploration of students’ behaviours as they are applied across a whole dataset, or focussing on partic- undertaking tasks using ELEs and the effectiveness of the ular students, tasks or sessions; intelligent support being provided by the ELE. The query • nodes that have a high probability of being visited on a processing techniques employed are based on incremental randomly chosen shortest path between two randomly query evaluation algorithms which run in polynomial time chosen nodes have high betweenness centrality; deter- with respect to the size of the database graph and the size mining this measure for pairs of event type nodes (ig- of the query and which return answers in order of increasing noring the directionality of the ‘occurrenceOf’ edges) edit distance [11]. A recent paper [19] gives details of an would identify event types that play key mediating implementation, which is based on the construction of an roles between other event types. automaton (NFA) for each query conjunct, the incremental construction of a weighted product automaton from each conjunct’s automaton and the data graph, and the use of We have already undertaken some ad hoc analyses of in- a ranked join to combine answers being incrementally pro- teraction data arising from classroom sessions using ELEs. duced from the evaluation of each conjunct. The paper also For example, Figure 4 shows the normalised incoming tran- presents a performance study undertaken on two data sets sitions for a 1-hour classroom session involving 22 students — lifelong learning data and metadata [17] and YAGO [22]. using MiGen (in the diagram, s denotes the ‘startTask’ and The first of these has rather ‘linear’ data, similar to the in- e the ‘endTask’ event types). Event types with an adja- teraction data discussed here, while the second has ‘bushier’ cent circle show transitions where this type of event occurs connectivity. Query performance is generally better for the repeatedly in succession. The thickness of each arrow or former than the latter, and the paper discusses several pos- circle indicates the value of the transition probability: the sible approaches towards query optimisation. thicker the line, the higher the probability. Red (light grey) is used for probabilities < 0.2 and black for probabilities In addition to evaluating queries over the interaction data, ≥ 0.2. We can observe a black arrow 3007 → 1005, indicat- by representing the data in the form of a graph it is possible ing transitions from events of type 3007 (detection by the to apply graph structure analyses such as the following: system that the student has made an implausible building block for this task) to events of type 1005 (modification of a • path finding and clustering: this would be useful for rule by the student). Such an observation raises a hypoth- determining patterns of interest across a whole dataset, esis for more detailed analysis or further student observa- or focussing on particular students, tasks or sessions tion, namely: “does the construction of an incorrect building c.f. [4]; block lead students to self-correct their rules?”. Developing a better understanding of such complex interaction can lead • average path length: this would be useful for determin- to improvement of the system. For this particular example, ing the amount of student activity (i.e. the number of we designed a new prompt that suggests to students to first indicator occurrences being generated per task) across consider the building block against the given task before a whole dataset, or focussing on particular students, proceeding unnecessarily in correcting their rules. More ex- tasks or sessions; amples of such ad hoc analyses are given in [14]. Represent- ing the interaction data in graph form will allow more sys- • graph diameter: to determine the greatest distance be- tematic, flexible and scalable application of graph-structure tween any two nodes (which, due to the nature of the algorithms such as those identified above. data, would be event type nodes); this would be an in- dication the most long-running and/or most intensive task(s); 4. CONCLUSIONS AND FUTURE WORK We have presented a graph model for representing event- • degree centrality: determining the in-degree centrality based interaction data arising from Exploratory Learning of event type nodes would identify key event types oc- Environments, drawing on the data generated when students curring in students’ interactions; this analysis could be undertake exploratory learning tasks with the eXpresser and FractionsLab microworlds. Although developed in the con- networks: Generating high level hints based on text of these systems, the model is a very general one and network community clustering. EDM, 2012. can easily be used or extended to model similar data from [7] B. Grawemeyer and et al. Light-bulb moment?: other ELEs. Towards adaptive presentation of feedback based on students’ affective state. IUI, pages 400–404, 2015. We have explored the possibilities that evaluating regular [8] S. Gutierrez-Santos, E. Geraniou, D. Pearce-Lazard, path queries over this graph-based representation might pro- and A. Poulovassilis. Design of Teacher Assistance vide for exploring the behaviours of students as they are Tools in an Exploratory Learning Environment for working in the ELE and the effectiveness of the intelligent Algebraic Generalization. IEEE Trans. Learn. Tech., support that it provides to them. We have also identified 5(4):366–376, 2012. additional graph algorithms that may yield further insights [9] S. Gutierrez-Santos, M. Mavrikis, and G. D. Magoulas. about learners, tasks and significant indicators. A Separation of Concerns for Engineering Intelligent Support for Exploratory Learning Environments. J. Planned worked includes transformation and uploading of Research and Practice in Inf. Tech., 44:347–360, 2013. the interaction data sets gathered during trials and full class- [10] A. Harrer, R. Hever, and S. Ziebarth. Empowering room sessions of the two systems into an industrial-strength researchers to detect interaction patterns in graph database such as Neo4J, following the graph model e-collaboration. Frontiers in Artificial Intelligence and presented in Section 2; followed by the design, implemen- Applications, 158:503, 2007. tation and evaluation of meaningful queries, analyses and [11] C. Hurtado, A. Poulovassilis, and P. Wood. Finding visualisations over the graph data, building on the work top-k approximate answers to path queries. FQAS, presented in Section 3. Equipped with an appropriate user pages 465–476, 2009. interface, educational researchers, designers or even teach- ers with less technical expertise could in this way explore [12] K. Koedinger and et al. A data repository for the the data from their perspective. This has the potential to EDM community: The PSLC datashop. Handbook of lead to an improved understanding of interaction in this Educational Data Mining, 43, 2010. context and to feed back to the design of the ELEs. We [13] M. Mavrikis and S. Gutierrez-Santos. Not all Wizards see this approach very much in the spirit of “polyglot per- are from Oz: Iterative design of intelligent learning sistence” (i.e. using different data storage methods to ad- environments by communication capacity tapering. dress different data manipulation problems), and hence be- Computers and Education, 54(3):641–651, 2010. ing used in conjunction with other EDM resources such as [14] M. Mavrikis, Z. Zheng, S. Gutierrez-Santos, and DataShop [12]. Another direction of research is investigation A. Poulovassilis. Visualisation and analysis of of how the flexible querying processing techniques for graph students’ interaction data in exploratory learning data (including both query approximation and query relax- environments. Workshop on Web-Based Technology ation) that have been developed in the context of querying for Training and Education (at WWW), 2015. lifelong learners’ data and metadata [11, 17] might be ap- [15] R. Noss and et al. The design of a system to support plied or adapted to the much finer-granularity interaction exploratory learning of algebraic generalisation. data described here and the more challenging pedagogical Computers and Education, 59(1):63–82, 2012. setting of providing effective intelligent support to learners [16] N. Pinkwart and et al. Graph grammars: An ITS undertaking exploratory tasks in ELEs. technology for diagram representations. FLAIRS, pages 433–438, 2008. Acknowledgments [17] A. Poulovassilis, P. Selmer, and P. Wood. Flexible querying of lifelong learner metadata. IEEE Trans. This work has been funded by the ESRC/EPSRC MiGen Learn. Tech., 5(2):117–129, 2012. project, the EU FP7 projects iTalk2Learn (#318051) and M C Squared (#610467). We thank all the members of [18] O. Scheuer and B. McLaren. CASE: A configurable these projects for their help and insights. argumentation support engine. IEEE Trans. Learn. Tech., 6(2):144–157, 2013. [19] P. Selmer, A. Poulovassilis, and W. P.T. Implementing 5. REFERENCES flexible operators for regular path queries. GraphQ [1] N. Belacel, G. Durand, and F. LaPlante. A binary (EDBT/ICDT Workshops), pages 149–156, 2015. integer programming model for global optimization of [20] V. Sheshadri, C. Lynch, and T. Barnes. InVis: An learning path discovery. G-EDM, 2014. EDM tool for graphical rendering and analysis of [2] D. Calvanese and et al. Containment of conjunctive student interaction data. G-EDM, 2014. regular path queries with inverse. KRR, pages [21] J. Stamper, M. Eagle, T. Barnes, and M. Croy. 176–185, 2015. Experimental evaluation of automatic hint generation [3] R. Dekel and K. Gal. On-line plan recognition in for a logic tutor. Artificial Intelligence in Education, exploratory learning environments. G-EDM, 2014. pages 345–352, 2011. [4] M. Eagle and T. Barnes. Exploring differences in [22] F. Suchanek, G. Kasneci, and G. Weikum. YAGO: a problem solving with data-driven approach maps. core of semantic knowledge. WWW, 2007. EDM, 2014. [23] D. Suthers. From contingencies to network-level [5] M. Eagle, D. Hicks, B. Peddycord III, and T. Barnes. phenomena: Multilevel analysis of activity and actors Exploring networks of problem-solving interactions. in heterogeneous networked learning environments. LAK, pages 21–30, 2015. LAK, 2015. [6] M. Eagle, M. Johnson, and T. Barnes. Interaction Exploring the function of discussion forums in MOOCs: comparing data mining and graph-based approaches Lorenzo Vigentini Andrew Clayphan Learning & Teaching Unit Learning & Teaching Unit UNSW Australia, UNSW Australia, Lev 4 Mathews, Kensington 2065 Lev 4 Mathews, Kensington 2065 +61 (2) 9385 6226 +61 (2) 9385 6226 l.vigentini@unsw.edu.au a.clayphan@unsw.edu.au ABSTRACT learning environments [6]. These can be deployed in a variety of In this paper we present an analysis (in progress) of a dataset ways, ranging from a tangential support resource which students containing forum exchanges from three different MOOCs. The can refer to when they need help, to a space for learning with forum data is enhanced because together with the exchanges and others, driven by the activities students have to carry out (usually the full text, we have a description of the design and pedagogical sharing work and eliciting feedback). The latter, in a sense, function of forums in these courses and a certain level of detail emulates class-time in traditional courses providing a space for about the users, which includes achievement, completion, and in structured discussions about the topics of the course. One could some instances more details such as: education; employment; age; argue that like in face-to-face classes, the value of the interaction and prior MOOC exposure. depends on the importance attributed to the forums by the instructors. This is an interesting point to explore teachers’ Although a direct comparison between the datasets is not possible presence and the value of their input in directing such because the nature of the participants and the courses are conversations. Mazzolini & Maddison characterize the role of the different, what we hope to identify using graph-based techniques teacher and teacher presence in online discussion forums as is a characterization of the patterns in the nature and development varying from being the ‘sage on the stage’, to the ‘guide on the of communication between students and the impact of the ‘teacher side’ or even ‘the ghost in the wings’ [7]. Furthermore they argue presence’ in the forums. With the awareness of the differences, we that the ‘ideal’ degree of visibility of the instructor in discussion hope to demonstrate that student engagement can be directed ‘by- forums depends on the purpose of forums and their relationship to design’ in MOOCs: teacher presence should therefore be planned assessment. There are also a number of accounts indicating that carefully in the design of large-scale courses. students’ learning in forums is not very effective [8, 9]. However if one looks at the data there are numerous examples indicating Keywords that behaviours in forums are good predictors of performance in MOOCs, Discussion forums, graph-based EDM, pedagogy. the courses using them, particularly if forum activities are assessed [10,11,12,13]. Yet, forums in MOOCs tend to attract only a small portion of the student activity [14]. This is setting 1. INTRODUCTION forums in MOOCs apart from ‘tutorial-type’ forums used to In the past couple of years MOOCs (Massive Open Online support students’ learning in online or blended courses in higher Courses) have become the center of much media hype as education. Furthermore, some argue that active engagement is not disruptive and transformational [1, 2]. Although the focus has the only way of benefiting from discussion forums [15] and been on a few characteristics of the MOOCS – i.e. free courses, students’ characteristics and preferences could be more important massive numbers, massive dropouts and implicit quality than the course design in determining the way in which they take warranted by the status of the institutions delivering these courses full advantage of online resources [16]. – a rapidly growing research interest has started to question the effectiveness of MOOCS for learning and their pedagogies. If one ignores entirely the philosophies of teaching driving the design 2. THE THREE MOOCS IN DETAIL and delivery of MOOCs going from the the socio-constructivist In order to investigate the way in which students use the (cMOOC, [4, 5]) to instructivist (xMOOC, [3]), at the practical discussion forums, we have extracted data from three MOOCS level, instructors have to make specific choices about how to use delivered by a large, research intensive Australian university. The the tools available to them. One of these tools is the discussion three courses are: P2P (From Particles to Planets - Physics); forum. Forums are one of the most popular asynchronous tools to LTTO (Learning to Teach Online); and INTSE (Introduction to support students’ communication and collaboration in web-based Systems Engineering), which are broadly characterised in the top of Table 1. The courses were specifically designed in quite different ways to test hypotheses about their design, delivery and effectiveness. In particular, P2P was designed emulating a traditional university course in a sequential manner. All content was released on a week-by-week basis dictating the pace of instruction. LTTO and INTSE, instead were designed to provide a certain level of flexibility for the students to elect their learning paths. All content was readily available at the start, however for LTTO, the delivery in forum is 4x in magnitude compared to the other courses. Yet, if followed a week-on-week delivery focusing on the interaction we look at the average amount of posts or comments, the patterns with students and a selective attention to particular weekly topics are not straightforward to interpret, as the level of engagement is (i.e. weekly feedback videos driven by the discussion forums as similar across the courses with 3 to 5 posts per student and 1 to 3 well as weekly announcements). Although announcements were comments (i.e. replies to existing posts), but with P2P showing a used also in INTSE, the lack of weekly activities in the forums did higher level of engagement than the other courses. One possible not impose a strong pacing. In INTSE, the forums had only a explanation is the different target group of the different courses tangential support value and were used mainly to respond to with INTSE including a majority of professional engineers with students’ queries and to clarify specific topics emerging from the postgraduate qualifications, P2P focusing on high school student quizzes. Table 1 provides an overview of the different courses. and teachers, and LTTO targeting a broad base of teachers across This also shows that the forum activity in the various courses is a different educational levels. very small portion of all actions emerging from the logs of activity which has been reported in the literature [9]. INTSE LTTO P2P Teachers at all High school Target group Engineers 3. DETAILS OF THE DATASET levels and teachers 3.1 The dataset Course length 9 weeks 8 weeks 8 weeks The data under consideration is an export form the Coursera 54 105 63 Forums platform. Raw forum database tables (posts, comments, tags, (14 top level) (17 top-level) (15 top-level) votes) as well as a JSON based web clickstream were used. The Design mode All-at-once All-at-once Sequential clickstream events consist of a key which specifies action – either a ‘pageview’ or ‘video’ item. Forum clickstream events were Delivery mode All-at-once Staggered Staggered identified by a common ‘/forum’ prefix. Use of forums Tangential Core activity Support The clickstream was further classified into: browsing; profile N in forum 422 (2.1%) 1685 (9.3%) 293 (2.8%) lookups; social interaction (looking at contributions); search; tagging; and threads. From the classification it became evident the 1361 6361 1399 Tot posts clickstream did not record all events, such as when a post or (avg=3.3) (avg=3.8) (avg=4.8) comment was made, or when votes were applied. For these, 285 2728 901 Tot comments specific database tables were used. In order to manage different (avg=0.7) (avg=1.7) (avg=3.1) data sets and sources, a standardized schema was built, allowing Registrants 32705 28558 22466 disparate sources to feed into, but exposing a common interface to conduct analysis over forum activities. This is shown in Figure 1. Active 60% 63% 47% students1 Figure 1. Forum data transformation process 4.2% 4.4% 0.7% Completing2 (0.3% D) (2.4 D) (0.2%) Table 1. Summary of the courses under investigation. NOTE: 1. Active students are those appearing in the clickstream; 2. Completing students achieve the pass grade or Distinction (D) The type of activity is summarised in Figure 2. In the chart, the five categories refer to the following: View corresponds to listing forums, threads and viewing posts; Post is the writing of a post or start of a new thread; Comment is a reply to an existing post; Social refers to all actions engaging directly with other’s status (up-vote, down-vote and looking at profiles/reputation); Engage refers to the additional interaction with forums content (searching, tagging, ‘watching’ or subscribing to posts or threads). The viewing behaviour is the most prominent for both the student and instructor groups and the figures are pretty much similar across the board. A two-way ANOVA (2x5, role by activity) on the percentage of distributions, shows that there is no significant difference between students and instructors, but there is an obvious difference between views and the other types of behaviour (F(4,29) = 1656.3, p < .01). If we consider the engagement over the timeline and compare the type of activities carried out by students and instructors, Figure 3 3.2 An overview of forums activity (end of the paper) shows the patterns for the three courses. The There are very interesting trends which require more detailed most striking pattern is that there doesn’t seem to be an obvious examination (bottom of table 1). As expected, in LTTO the forum one. For what concerns posts and views in all the three courses activity is larger than in the other courses and this is probably due there is a sense of synchronicity between the two groups, however to the fact that students were asked to submit post in forums from this chart it is not possible to understand in more detail what following the learning activities. The proportion of active students are the connections between what students and teachers do. Instructors’ comments are slightly offset, possibly as a reaction to activity of students in the forums is organized according to a set of students’ posts. An interesting aspect is the amount of ‘social’ commonly used quantitative metrics and a couple of measures engagement in the P2P course that merits further analysis. borrowed from Social Network Analysis (table 2). Although this seems to be a promising approach, there are two issues with this 4. DIRECTIONS AND OPEN QUESTIONS methodology in the MOOCs: 1) only a tiny proportion of students From this coarse analysis it is apparent that there seem to be can be considered active and 2) it is hard to scale the instructor’s minimal behavioural differences in the way students and evaluation. The first problem is not easily resolved and it is an instructors interact in the different courses, however more analysis issue in the literature reviewed [17, 18]; non-posting behavior is is required to tackle questions about the individual differences in considered as an index of disengagement, partly because this is students’ and instructors’ patterns of interaction and their easy to measure. In principle the latter could be substituted by interrelations. Furthermore little can be said about how the nature peer evaluation (up-vote, down-vote), but there is no easy way to of interactions drives the development of communication and ensure consistency. engagement. However a number of questions like the following Indicator Type Description remain open and unanswered: how do discussions develop over time? How teacher presence affects the development of Messages Quantitative Number of messages written by discussions? Is the number of forums affecting how students the student. engage with them (i.e. causing disorientation)? Threads Quantitative Number of new threads created by the student. Figure 2. Distribution of forum activities by role Words Quantitative Number of words written by the student. Sentences Quantitative Number of sentences written by the student. Reads Quantitative Number of messages read on the forum by the student. Time Quantitative Total time, in minutes, spent on forum by the student. AvgScoreMsg Qualitative Average score on the instructor's evaluation of the student's messages. Centrality Social Degree centrality of the student. Prestige Social Degree prestige of the student. Table 2. Possible indicators characterising forum engagement An alternative method that can be explored is graph-based approaches. For example, Bhattacharya et al. [19] used graph- based techniques to explore the evolution of software and source branching providing an insight in the process. Kruck et al. [20] developed GSLAP, an interactive, graph‐based tool for analyzing web site traffic based on user‐defined criteria. Kobayashi et al. [18] used a method to quickly identify and track the evolution of topics in large datasets using a mix of assignment of documents to time slices and clustering to identify discussion topics. Yang et al [21] integrated graph-based clustering to characterize the emergence of communities and text-based analysis to portray the nature of exchanges. In fact, students move in the various sub-forums taking different roles or stances as they engage with different subsets of students. As the reasons to engage in these discussions are partly determined by different interests, goals, and issues, it is possible to construct a social network graph based on the post-reply-comment structure within threads. The network generated provides a possible view of a 4.1 The DM and graph-based approaches student’s social participation within a MOOC, which may indicate A possible way to answer the questions about the types/patterns of some detail about their values, beliefs and intentions. behaviours, the structure and development of networks and the growth of groups/communities over time might be using data Furthermore, Brown et al [22] have already shown the value of mining and graph-based approaches. For example, [6] used a exploring the communities in discussion forums in MOOCs combination of quantitative, qualitative and social network particularly for what concerns the homogeneity of performance information about forum usage to predict students' success or but dissimilarity of motivations characterizing student hubs. failure in a course by applying classification algorithms and classification via clustering algorithms. In their approach the 4.2 Discussion points [6] Cristóbal Romero, Manuel-Ignacio López, Jose-María Luna, The examples above provide evidence of the potential for using and Sebastián Ventura. 2013. Predicting students’ final graph-based methods to obtain better insights into the process and performance from participation in on-line discussion forums. content analysis for our dataset and to extend its applicability to Computers & Education 68 (October 2013), 458–472. DOI: MOOCs, however there are a number of contentious points to http://dx.doi.org/10.1016/j.compedu.2013.06.009 raise which will provide opportunities for discussion. [7] Margaret Mazzolini and Sarah Maddison. 2007. When to Firstly the number of students who are actively involved in jump in: The role of the instructor in online discussion discussion is a very small proportion of the active participants. forums. Computers & Education 49, 2 (September 2007), This means that the subset may not be representative at all. One 193–213. could argue that these students are already engaged or desperately DOI:http://dx.doi.org/10.1016/j.compedu.2005.06.011 need help. Previous literature [21, 22, 23] focused on the ability [8] M.j.w. Thomas. 2002. Learning within incoherent structures: to predict performance and on the peer effect which can emerge the space of online discussion forums. Journal of Computer from the analysis of the graphs/social networks. Assisted Learning 18, 3 (September 2002), 351–366. Secondly, one could question the value of the communities in DOI:http://dx.doi.org/10.1046/j.0266-4909.2002.03800.x xMOOCs: especially when courses are designed with an [9] Daniel F.O. Onah, Jane Sinclair, and Russell Boyatt. 2014. instructivits approach leading to mastery, by definition this is an Exploring the use of MOOC discussion forums. In individualistic perspective focused on the testing of one’s own Proceedings of London International Conference on skills/learning. Of course in cMOOCs -connectivists by design- Education. London: LICE, 1–4. the importance of the development of social support is essential. [10] Alstete, J.W. and Beutell, N.J. Performance indicators in This seems to be supported by Brown et al [22]: they were not online distance learning courses: a study of management able to uncover a direct relation between stated goals and education. Quality Assurance in Education 12, 1 (2004), 6– motivations with the participation in forums, and attributed this to 14. pragmatic needs. However, as the authors suggested earlier, the instructors might play a fundamental role in shaping the [11] Cheng, C.K., Paré, D.E., Collimore, L.-M., and Joordens, S. communities based on the value attributed to forums in their Assessing the effectiveness of a voluntary online discussion plans/design and the level of engagement/interaction. Considering forum on improving students’ course performance. the split between cMOOCs and xMOOCs again, interesting work Computers & Education 56, 1 (2011), 253–261. might come out of the experiment conducted by Rose’ and [12] Palmer, S., Holt, D., and Bray, S. Does the discussion help? colleagues in the DALMOOC in which automated agents were The impact of a formally assessed online discussion on final deployed to support students’ conversations. In Coursera the student results. British Journal of Educational Technology deployment of ‘community mentors’ will be an interesting space 39, 5 (2008), 847–858. to explore, given that the importance of design seems to be [13] Patel, J. and Aghayere, A. Students’ Perspective on the removed from instructors in the ‘on-demand’ model. Impact of a Web-based Discussion Forum on Student Lastly, more research is needed in the time-based dimension of Learning. Frontiers in Education Conference, 36th Annual, development of forums in MOOCs. Questions like how students (2006), 26–31. bond and create stable relations, how they become authoritative [14] Jacqueline Aundree Baxter and Jo Haycock. 2014. Roles and and what motivates them to contribute over time are all open student identities in online large course forums: Implications questions which the analysis of graphs over time might be able to for practice. The International Review of Research in Open address. and Distributed Learning 15, 1 (January 2014). 5. REFERENCES [15] Vanessa Paz Dennen. 2008. Pedagogical lurking: Student [1] Dirk Jan van den Berg and Edward Crawley. Why MOOCS engagement in non-posting discussion behavior. Computers Are Transforming the Face of Higher Education. Retrieved in Human Behavior 24, 4 (July 2008), 1624–1633. April 12, 2015 from http://www.huffingtonpost.co.uk/dirk- DOI:http://dx.doi.org/10.1016/j.chb.2007.06.003 jan-van-den-berg/why-moocs-are- [16] René F. Kizilcec, Chris Piech, and Emily Schneider. 2013. transforming_b_4116819.html Deconstructing Disengagement: Analyzing Learner [2] Chris Parr. The evolution of Moocs. Retrieved April 12, Subpopulations in Massive Open Online Courses. In 2015 from Proceedings of the Third International Conference on http://www.timeshighereducation.co.uk/comment/opinion/th Learning Analytics and Knowledge. LAK ’13. New York, e-evolution-of-moocs/2015614.article NY, USA: ACM, 170–179. DOI:http://dx.doi.org/10.1145/2460296.2460330 [3] C. Osvaldo Rodriguez. 2012. MOOCs and the AI-Stanford Like Courses: Two Successful and Distinct Course Formats [17] Dennen, V.P. Pedagogical lurking: Student engagement in for Massive Open Online Courses. European Journal of non-posting discussion behavior. Computers in Human Open, Distance and E-Learning (January 2012). Behavior 24, 4 (2008), 1624–1633. [4] George Siemens. 2005. Connectivism: A learning theory for [18] Kobayashi, M. and Yung, R. Tracking Topic Evolution in the digital age. International journal of instructional On-Line Postings: 2006 IBM Innovation Jam Data. In T. technology and distance learning 2, 1 (2005), 3–10. Washio, E. Suzuki, K.M. Ting and A. Inokuchi, eds., Advances in Knowledge Discovery and Data Mining. [5] Stephen Downes. 2008. Places to go: Connectivism & Springer Berlin Heidelberg, 2008, 616–625. connective knowledge, Innovate. [19] Bhattacharya, P., Iliofotou, M., Neamtiu, I., and Faloutsos, [22] R. Brown, C Lynch, Y. Wang, M. Eagle, J. Albert, T. M. Graph-based Analysis and Prediction for Software Barnes, R. Baker, Y. Bergner, D. McNamara. Communities Evolution. Proceedings of the 34th International Conference of performance and communities of preference. GEDM on Software Engineering, IEEE Press (2012), 419–429. 2015, in press. [20] Kruck, S.E., Teer, F., and Jr, W.A.C. GSLAP: a graph-based [23] M. Fire, G. Katz, Y. Elovici, B. Shapira, and L. Rokach, web analysis tool. Industrial Management & Data Systems “Predicting Student Exam’s Scores by Analyzing Social 108, 2 (2008), 162–172. Network Data,” in Active Media Technology, R. Huang, A. A. Ghorbani, G. Pasi, T. Yamaguchi, N. Y. Yen, and B. Jin, [21] D. Yang, M. Wen, A. Kumar, E. P. Xing, and C. P. Rose, Eds. Springer Berlin Heidelberg, 2012, pp. 584–595. “Towards an integration of text and graph clustering methods as a lens for studying social interaction in MOOCs,” The International Review of Research in Open and Distributed Learning, vol. 15, no. 5, Oct. 2014. Figure 3. Time sequence of activity in the forums in the three courses by students and instructors grouped by activity type BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game Drew Hicks Yihuan Dong Rui Zhi North Carolina State North Carolina State North Carolina State University University University 911 Oval Drive 911 Oval Drive 911 Oval Drive Raleigh, NC 27606 Raleigh, NC 27606 Raleigh, NC 27606 aghicks3@ncsu.edu ydong2@ncsu.edu rzhi@ncsu.edu Veronica Cateté Tiffany Barnes North Carolina State North Carolina State University University 911 Oval Drive 911 Oval Drive Raleigh, NC 27606 Raleigh, NC 27606 vmcatete@ncsu.edu tmbarnes@ncsu.edu ABSTRACT model student behavior in tutors, and provide insight into In the field of Intelligent Tutoring Systems, data-driven meth- problem-solving strategies and misconceptions. This data ods for providing hints and feedback are becoming increas- structure can be used to provide hints, by treating the Inter- ingly popular. One such method, Hint Factory, builds an action Network similarly to a Markov Decision Process and interaction network out of observed player traces. This data selecting the most appropriate next step from the requesting structure is used to select the most appropriate next step user’s current state. This method has successfully been em- from any previously observed state, which can then be used ployed in systems in which each action a player may take is to provide guidance to future players. However, this method of similar cost; for example in the Deep Thought logic tutor has previously been employed in systems in which each ac- each action is an application of a particular axiom. Applying tion a player may take requires roughly similar effort; that this method to an environment where actions are of differ- is, the “step cost” is constant no matter what action is taken. ent costs, or outcomes are of varying value will require some We hope to apply similar methods to an interaction network adaptations to be made. In this work, we discuss how we built from player traces in our game, BOTS; However, each will apply Hint Factory methods to an interaction network edge can represent a varied amount of effort on the part of built from player traces in a puzzle game, BOTS. In BOTS, the student. Therefore, a different hint selection policy may each “Action” is the set of changes made to the program be needed. In this paper, we discuss the problems with our between each run. Therefore, using the current hint selec- current hint policy, assuming all edges are the same cost. tion policy would result in very high-level hints comprising Then, we discuss potential alternative hint selection policies a great number of changes to the student’s program. Since we have considered. this is undesirable, a different hint selection policy may be needed. Keywords Hint Generation, Serious Games, Data Mining 2. DATA-DRIVEN HINTS AND FEEDBACK In the ITS community, several methods have been proposed 1. INTRODUCTION for generating hints/feedback from previous observations of Data-driven methods for providing hints and feedback are users’ solutions or behavior. Rivers et al propose a data- becoming increasingly popular, and are especially useful for driven method to generate hints automatically for novice environments with user- or procedurally-generated content. programmers based on Hint Factory[8]. They present a One such method, Hint Factory, builds an interaction net- domain-independent algorithm, which automates hint gener- work out of observed player traces. An Interaction Network ation. Their method relies on solution space, which utilizes is a complex network of student-tutor interactions, used to graph to represent the solution states. In solution space, each node represents a candidate solution and each edge rep- resents the action used to transfer from one state to another. Due to the existence of multiple ways to solve a program- ming problem, the size of the solution space is huge and thus it is impractical to use. A Canonicalizing model is used to reduce the size of the solution space. All states are trans- formed to canonicalized abstract syntax trees (ASTs). If the canonical form of two different states are identical, they can be combined together. After simplifying the solution space, hint generation is implemented. If the current state is in- correct and not in the solution space, the path construction algorithm will find an optimal goal state in the solution space which is closest to current state. This algorithm uses change vectors to denote the change between current state and goal state. Once a better goal state is found during enumerat- ing all possible changes, it returns the current combination of change vectors. Each change vector can be applied to current state and then form an intermediate state. The in- termediate states are measured by desirability score, which represents the value of the state. And then the path con- struction algorithm generates optimal next states based on the rank of the desirability scores of all the intermediate states. Thus a new path can be formed and added to the solution space, and appropriate hints can be generated. Jin et al propose linkage graph to generate hints for pro- gramming courses[4]. Linkage graph uses nodes to represent program statements and direct edges to indicate ordered de- Figure 1: The BOTS interface. The robot’s program pendencies of those statements. Jin’s approach applies ma- is along the left side of the screen. The “toolbox” of trix to store linkage graph for computation. To generate available commands is along the top of the screen. linkage matrix, first, they normalize variables in programs by using instructor-provided variable specification file. After variable normalization, they sort the statement with 3 steps: (i) preprocessing, which breaks a single declaration for mul- information and check if hints are available for the state. tiple variables (e.g. int a, b, c) into multiple declaration The action that leads to subsequent state with the highest statements (e.g. int a; int b; int c;); (ii) creating statement value is used to generate a hint sequence. A hint sequence sets according to variable dependencies, which put indepen- consists of four types of hints and are ordered from gen- dent statements into first set, put statements depend only eral hint to detailed hint. Hint provider will then show hint on statements in the first set into second set, put statements from top of the sequence to the student. Hint Factory has depends only on statements in the first and second set into been applied in logic tutors which helps students learn logic third set, and so on; (iii) in-set statement sorting, during proof. The result shows that the hint-generating function which the statements are sorted in decreasing order within could provide hints over 80% of the time. set using their variable signatures. In hint generation, they first generate linkage graphs with a set of correct solutions, 3. BOTS as the sources for hint generation. They also compose the BOTS is a programming puzzle game designed to teach intermediate steps during program development into a large fundamental ideas of programming and problem-solving to linkage graph, and assign a reward value to each state and novice computer users. The goal of the BOTS project is the correct solution. Then, they apply value iteration to to investigate how to best use community-authored content create a Markov Decision Process (MDP). When a student within serious games and educational games. BOTS was requires hint, tutor will generate a linkage graph for the par- inspired by games like LightBot [10] and RoboRally [2], as tial program and try to find the closest match in MDP. If well as the success of Scratch and it’s online community [1] a match is found in MDP, the tutor would generate hint [5]. In BOTS, players take on the role of programmers writ- with the next best state based on highest assigned value. ing code to navigate a simple robot around a grid-based 3D If a match is not found in current MDP, which means the environment, as seen in Figure 1. The goal of each puzzle student is taking a different approach from existing correct is to press several switches within the environment, which solutions, the tutor will try to modify those correct solutions can be done by placing an object or the robot on them. to fit student’s program and then provide hints. To program the robots, players will use simple graphical pseudo-code, allowing them to move the robot, repeat sets Hint Factory[9] is an automatic hint generation technique of commands using “for” or “while” loops, and re-use chunks which uses Markov decision processes (MDPs) to generate of code using functions. Within each puzzle, players’ scores contextualized hints from past student data. It mainly con- depend on the number of commands used, with lower scores sists of two parts - Markov Decision Process (MDP) gener- being preferable. In addition, each puzzle limits the maxi- ator and hint provider. The MDP generator runs a process mum number of commands, as well as the number of times to generate MDP values for all states seen in previous stu- each command can be used. For example, in the tutorial dents’ solutions. In this process, all the students’ solutions levels, a user may only use the “Move Forward” instruction are combined together to form a single graph. Each node 10 times. Therefore, if a player wants to make the robot of the graph represents a state, and each edge represents an walk down a long hallway, it will be more efficient to use a action one student takes to transform from current state to loop to repeat a single “Move Forward” instruction, rather another state. Once the graph is built, the MDP generator than to simply use several “Move Forward” instructions one uses Bellman backup to assign values for all nodes. After up- after the other. These constraints are meant to encourage dating all values, a hint file is generated. The hint provider players to re-use code and optimize their solutions. uses hint file to provide hint. When a student asks for a hint at a existing state, hint provider will retrieve current state In addition to the guided tutorial mode, BOTS also con- tains an extensive “Free Play” mode, with a wide selection of puzzles created by other players. The game, in line with a b the “Flow of Inspiration” principles outlined by Alexander Repenning [7], provides multiple ways for players to share knowledge through authoring and modifying content. Play- ers are able to create their own puzzles to share with their peers, and can play and evaluate friends’ puzzles, improv- ing on past solutions. Features such as peer-authored hints c d for difficult puzzles, and a collaborative filtering approach to rating are planned next steps for the game’s online ele- ment. We hope to create an environment where players can Error continually challenge their peers to find consistently better solutions for increasingly difficult problems. e f User-generated content supports replayability and a sense of a community for a serious game. We believe that user- created puzzles could improve interest, encouraging students to return to the game to solidify their mastery of old skills and potentially helping them pick up new ones. Figure 2: Several generated hints for a simple puz- zle. The blue icon represents the robot. The ’X’ icon 4. ANALYSIS represents a goal. Shaded boxes are boxes placed on goals, while unshaded boxes are not on goals. Hint 4.1 Dataset F in this figure depicts the start and most common Data for the BOTS studies has come from a middle school goal state of the puzzle. computer science enrichment program called SPARCS. In this program, the students attend class on Saturday for 4 hours where computer science undergraduates teach them (start/intermediate) → (intermediate/goal) These transitions about computational thinking and programming. Students occur when a student moves the robot to an intermediate attend a total of 7 sessions, each on a different topic, ranging state rather than directly to the goal. Since players run from security and encryption to game design. The students their programs to produce output, we speculate that these all attend the same magnet middle school. The demograph- may represent subgoals such asmoving a box onto a specific ics for this club are 74.2% male, 25.8% female, 36.7% African switch. After accomplishing that task, the user then ap- American, and 23.3% Hispanic. The student’s grade distri- pends to their program, moving towards a new objective, bution is 58% 6th grade, 36% 7th grade and 6% 8th grade. until they reach a goal state. Hint B in Figure 2 shows a hint generated from such a transition. From these sessions, we collected gameplay data for 20 tu- torial puzzles as well as 13 user-created puzzles, With this data, we created an Interaction Network in order to be able 4.2.2 Correction Transition to provide hints and feedback for future students [3]. How- (error/mistake) → (intermediate/goal) This transition oc- ever, using program edits as states, the interaction networks curs when a student makes and then corrects a mistake. produced were very sparse. In order to be better able to These are especially useful because we can offer hints based relate similar actions, we produced another interaction net- on the type of mistake. Hints D and E in Figure 2 show hints work using program output as our state definition [6]. built from this type of transition; however, hint E shows a case where a student resolved the mistake in a suboptimal way. 4.2 States and Transitions Based on the data collected, we can divide the set of ob- served states into classes. First among these is the start state 4.2.3 Simple Solution Transition in which the problem begins. By definition, every player’s (start) → (goal) This occurs when a student enters an entire, path must begin at this state. Next is the set of goal states correct program, and solves the puzzle in one attempt. This in which all buttons on the stage are pressed. These are makes such transitions not particularly useful for generating reported by the game as correct solutions. Any complete hints, other than showing a potential solution state of the solution, by definition, ends at one such state. Among states puzzle. Hint F in Figure 2 shows this type of transition. which are neither start nor goal states, there are three im- portant classifications: Intermediate states (states a robot moves through during a correct solution), mistake states 4.2.4 Rethinking Transition (states a robot does not move through during a correct so- (intermediate) → (intermediate/goal) This transition occurs lution), and error states (states which result from illegal when rather than appending to the program as in a subgoal output, like attempting to move the robot out-of-bounds). transition, the user deletes part or all of their program, then Based on these types of states, we classified our hints based moving towards a new goal. As a result, the first state is on the transitions they represented. unrelated to the next state the player reaches. Offering this state as a hint would likely not help guide a different user. Hint A in Figure 2 shows an example of this. Finding and 4.2.1 Subgoal Transition recognizing these is an important direction for future work. 4.2.5 Error Transition (start/intermediate) → (mistake/error) This corresponds to a program which walks the robot out of bounds, into an object, or other similar errors. While we disregarded these as hints, this type of transition may still be useful. In such a case, the last legal output before the error could be a valuable state. Hint C in Figure 2 is one such case. 4.3 Next Steps While this approach was able to help us identify interest- Figure 3: This subgraph shows a short-circuit where ing transitions, as well as significantly reduce the sparseness a player bypasses a chain of several steps. of the Interaction Network by merging states with similar output, we violate several assumptions of the Hint Factory technique by using user compilation as an action. Essen- Another problem which arises from this state representation tially, the cost of an action can vary widely. In the most is seen in Hints C and E above. These hints show states extreme examples, the best next state selected by Hint Fac- where a student traveled from a state to a worse state before tory will simply be the goal state. ultimately solving the problem. Since we limit our search for hintable states to the immediate child states of s in s0 , we are 4.4 Current Hint Policy unable to escape from such a situation if the path containing Our current hint selection policy is the same as the one used the error is the best or only observable path to the goal. in the logic tutor Deep Thought with a few exceptions [9]. We combine all student solution paths into a single graph, mapping identical states to one another (comparing either 4.5 Proposed Hint Policies One potential modification of the hint policy involves ana- the programs or the output). Then, we calculate a fitness lyzing the programs/output on the nodes, using some dis- value for each node. We assign a large positive value (100) tance metric δ(s, s0 ). This measurement would be used in to each goal state, a low value for dead-end states (0) and addition to the state’s independent fitness value R(s) which a step cost for each step taken (1). Setting a non-zero cost takes into account distance from a goal, but is irrespective on actions biases us towards shorter solutions. We then of the distance from any previous state. For example in the calculate fitness values V (s) for each state s, where R(s) is short-circuit example above, using “difference in number of the initial fitness value for the state, γ is a discount factor, lines of code” as a distance metric we could take into ac- and P (s, s0 ) is the observed frequency with which users in count how far the “Goal” state is from the “Start” state, and state s go to state s0 next, via taking the action a. The potentially choose a nearer state as a hint. This also helps equation for determining the fitness value of a state is as correct for small error-correction steps in player solutions; follows: if the change between the current state and the target hint state is very small, we may want to consider hinting toward the next step instead, or a different solution path altogether. X V (s) := R(s) + γmax Pa (s, s0 )V (s0 ) (1) a s0 X V (s) := R(s) + γmax δ(s, s0 )P (s, s0 )V (s0 ) (3) a However, in our current representation there is only one s0 available action from any state: “run.” Different players us- ing this action will change their programs in different ways One potential downside to this approach is that it requires between runs, so it is not useful to aggregate across all the somewhat more knowledge of the domain to be built into the possible resulting states. Instead, we want to consider each model. If the distance metric used is inaccurate or flawed, resulting state on its own. As a result, we use a simplified there may be cases where we choose a very suboptimal hint. version of the above, essentially considering each possible using difference in lines of code as our distance metric, the resulting state s0 as the definite result of its own action: change between a state where a player is using no functions and a state where the user writes existing code into a func- tion may be very small. Hints selected in these cases might V (s) := R(s) + γmax 0 P (s, s0 )V (s0 ) (2) guide students away from desired outcomes in our game. s Another problem we need to resolve with our current hint Since the action “run” can encompass many changes, select- policy, as discussed above, is the case where the best or ing the s0 which maximizes the value may not always be the only path to a goal from a given state s has an error as best choice for a hint. The difference between s and s0 can a direct child s0 . One method of resolving this could be, be quite large, and this is usually the case when an expert instead of offering s0 as a hint, continuing to ask for next- user solves the problem in one try, forming an edge directly step hints from s0 until some s0 is a hintable, non-error state. between the “start” state and “goal” state. These and other This solution requires no additional knowledge of the game “short-circuits” make it difficult to assess which of the child domain, however it’s possible that the hint produced will nodes would be best to offer as a hint by simply using the be very far from s, or that we may skip over important calculated fitness value. information about how to resolve the error or misconception that led the student into state s in the first place. [4] W. Jin, T. Barnes, J. Stamper, M. J. Eagle, M. W. Johnson, and L. Lehmann. Program representation for Other modifications to the hint selection policy may produce automatic hint generation for a data-driven novice better results than these. We hope to look into as many programming tutor. In Intelligent Tutoring Systems, possible modifications as we can, seeing which modifications pages 304–309. Springer, 2012. produce the most suitable hints on our current dataset be- [5] D. J. Malan and H. H. Leitner. Scratch for budding fore settling on an implementation for the live version of the computer scientists. ACM SIGCSE Bulletin, game. 39(1):223–227, 2007. [6] B. Peddycord III, A. Hicks, and T. Barnes. 5. ACKNOWLEDGMENTS Generating hints for programming problems using Thanks to the additional developers who have worked on this intermediate output. project or helped with our outreach activities so far, includ- [7] A. Repenning, A. Basawapatna, and K. H. Koh. ing Aaron Quidley, Trevor Brennan, Irena Rindos, Vincent Making university education more like middle school Bugica, Victoria Cooper, Dustin Culler, Shaun Pickford, computer club: facilitating the flow of inspiration. In Antoine Campbell, and Javier Olaya. This material is based Proceedings of the 14th Western Canadian Conference upon work supported by the National Science Foundation on Computing Education, pages 9–16. ACM, 2009. Graduate Research Fellowship under Grant No. 0900860 [8] K. Rivers and K. R. Koedinger. Automating hint and Grant No. 1252376. generation with solution space path construction. In Intelligent Tutoring Systems, pages 329–339. Springer, 6. REFERENCES 2014. [1] I. F. de Kereki. Scratch: Applications in computer [9] J. Stamper, T. Barnes, L. Lehmann, and M. Croy. science 1. In Frontiers in Education Conference, 2008. The hint factory: Automatic generation of FIE 2008. 38th Annual, pages T3B–7. IEEE, 2008. contextualized help for existing computer aided [2] R. Garfield. Roborally. [Board Game], 1994. instruction. In Proceedings of the 9th International [3] A. Hicks, B. Peddycord III, and T. Barnes. Building Conference on Intelligent Tutoring Systems Young games to learn from their players: Generating hints in Researchers Track, pages 71–78, 2008. a serious game. In Intelligent Tutoring Systems, pages [10] D. Yaroslavski. LightBot. [Video Game], 2008. 312–317. Springer, 2014. Semantic Graphs for Mathematics Word Problems based on Mathematics Terminology Rogers Jeffrey Leo John Thomas S. McTavish Rebecca J. Passonneau Center for Computational Center for Digital Data, Center for Computational Learning Systems Analytics & Adaptive Learning Learning Systems Columbia University Pearson Columbia University New York, NY, USA Austin, TX, USA New York, NY, USA rl2689@columbia.edu tom.mctavish@pearson.com becky@ccls.columbia.edu ABSTRACT ercises, we randomly walk a graph using the cosine distance We present a graph-based approach to discover and extend as edge weights. We also recast the problem as a bipartite semantic relationships found in a mathematics curriculum graph with exercises on one side and words on the other, to more general network structures that can illuminate re- providing an edge when an exercise contains the math word. lationships within the instructional material. Using words We contrast these two different types of random walks and representative of a secondary level mathematics curriculum find somewhat similar results, which lends confidence to the we identified in separate work, we constructed two similar- analysis. The bipartite graph walks, however, are more sen- ity networks of word problems in a mathematics textbook, sitive to differences in word frequency. Casting measures of and used analogous random walks over the two networks similarity as graphs and performing random walks on them to discover patterns. The two graph walks provide similar affords more nuanced ways of relating objects, which can be global views of problem similarity within and across chap- used to build more granular domain models for analysis of ters, but are affected differently by number of math words prerequisites, instructional design, and adaptive learning. in a problem and math word frequency. 1. INTRODUCTION 2. RELATED WORK Curricula are compiled learning objects, typically presented Random walks over graphs have been used extensively to in sequential order and arranged hierarchically, as in a book’s measure text similarity. Applications include similarity of Table of Contents. Ideally, a domain model captures rela- web pages [15] and other documents [5], citations [1], pas- tionships between the learning objects and the knowledge sages [14], person names in email [12] and so on. More re- components or skills they exercise. Unfortunately, domain cently, general methods that link graph walks with external models are not often granular enough for optimal learning resources like WordNet have been developed to produce a experiences. For example, prerequisite relationships may be single system that handles semantic similarity for words, lacking, or the knowledge components associated with an sentences or text [16]. Very little work compares walks over exercise may be unknown. In such cases, assessments on graphs of the same content, where the graphs have differ- those learning objects will be insufficient to enable appro- ent structure. We create two different kinds of graphs for priate redirection unless expert (i.e. teacher) intervention mathematics word problem and compare the results. We is explicitly given. Domain models remain coarse because find that the global results are very similar, which is good using experts to enumerate and relate the knowledge com- evidence for the general approach, and we find differences in ponents is costly. detail that suggest further investigation could lead to cus- tomizable methods, depending on needs. As a means to automatically discover relationships among learning objects and to reveal their knowledge components, An initiative where elementary science and math tests are a we demonstrate the use of direct similarity metrics and ran- driver for artificial intelligence has led to work on knowledge dom graph walks to relate exercises in a mathematics cur- extraction from textbooks. Berant et al. [2] create a system riculum. We first apply a standard cosine similarity measure to perform domain-specific deep semantic analysis of a 48 between pairs of exercises, based on bag-of-word vectors con- paragraphs from a biology textbook for question answering. sisting of math terms that we identified in separate work Extracted relations serve as a knowledge base against which [7]. Then, to extract less explicit relationships between ex- to answer questions, and answering a question is treated as finding a proof. A shallow approach to knowledge extraction from a fourth grade science curriculum is taken in [6], and the knowledge base is extended through dialog with users until a path in the knowledge network can be found that supports a known answer. In the math domain, Kushman et al. [10] generate a global representation of algebra problems in order to solve them by extracting relations from sentences and aligning them. Seo et al. [18] study text and diagrams together in order to understand the diagrams better through textual cues. We are concerned with alignment of content Two machines produce the same type of widget. Machine high agreement. All the non-stop words were then labeled A produces W widgets, X of which are damaged. Machine B by a trained annotator. We developed a supervised machine produces Y widgets, Z of which are damaged. The fraction of learning approach to classify vocabulary into math and non- X damaged widgets for Machine A is W or (simplified fraction). math words [7] that can be applied to new mathematics cur- Z The fraction of damaged widgets for Machine B is Y or ricula. For the text used here, there were 577 math terms. (simplified fraction). Write each fraction as a decimal and a percent. Use pencil and paper. Select a small percent that would allow for a small number of damaged widgets. Find 3.3 Random Walks in Graphs A random walk on a graph starts at a given node and steps the number of widgets by which each machine exceeded the with random probability to a neighboring node. The same acceptable number of widgets. random decision process is employed at this and every sub- sequent node until a termination criterion is met. Each time Figure 1: Sample problem; math terms are in boldface. a node is visited, it is counted. Open random walks require that the start node and end nodes differ. Traversal methods may employ a bias to navigate toward or away from certain across rather than within problems, and our objective is neighbors through edge weights or other graph attributes. finer-grained analysis of curricula. In a graph, G = (V, E) with nodes V and edges E, a ran- Other work that addresses knowledge representation from dom walk that begins at vx and ends at vy can be denoted as text includes ontology learning [3], which often focuses on (vx , ..., vy ). By performing several random walks, the frac- the acquisition of sets of facts from text [4]. There has tion of times the node vy is visited converges to the prob- been some work on linking lexical resources like WordNet ability of target vy being visited given the start node vx , or FrameNet to formal ontologies [17, 13], which could pro- which can be expressed as P (vy |vx ) under the conditions of vide a foundation for reasoning over facts extracted from the walk. In the case of a random walk length of 1, P (vy |vx ) text. We find one work that applies relation mining to e- will simply measure the probability of vy being selected as learning: Šimko and Bieliková [19] apply automated relation an adjacent node to vx . mining to extract relations to support e-course authoring in the domain of teaching functional programming. Li et al. [11] apply k-means clustering to a combination of problem 3.4 Cosine Similarity Graph features and student performance features, and propose the Math exercises are represented as bag-of-words vectors with clusters correspond to Knowledge Components [8]. boolean values to indicate whether a given math term is present. Cosine similarity quantifies the angle between the two vectors, and is given by the dot product of two vectors. 3. METHODS Pn 3.1 Data te i=1 ti ei cos(t, e) = = pPn pPn (1) We used 1800 exercises from 17 chapters of a Grade 7 mathe- ktkkek i=1 (ti )2 i=1 (ei ) 2 matics curriculum. Most are word problems, as illustrated in Similarity values of 1 indicate that both the vectors are the Figure 1. They can incorporate images, tables, and graphs, same whereas a value of zero indicates orthogonality between but for our analysis, we use only the text. The vocabu- the two vectors. Pairwise cosine similarities for all 1800 lary of the resulting text consists of 3,500 distinct words. exercises were computed, yielding a cosine similarity matrix We construct graphs where math exercises are the nodes, or Mcos . The matrix corresponds to a graph where non-zero in a bipartite graph, math exercises are the left side nodes cosine similarities are edge weights between exercises. and words are the right side nodes. Our initial focus is on exercise similarity due to similarity of the math skills that In a graph walk, the probability that a node vy will be exercises tap into, and we use mathematics terminology as reached in one step from a node vx is given by the prod- an indirect proxy of skills a problem draws upon. uct of the degree centrality of vx and the normalized edge weight (vx , vy ). With each exercise as a starting node, we 3.2 Math Terminology performed 100,000 random walks on the cosine-similarity The text of the word problems includes ordinary language graph, stepping with proportional probability to all outgo- expressions unrelated to the mathematics curriculum, such ing cosine similarity weights. To measure 2nd degrees of as the nouns machines, widgets shown in problem in Fig- separation, with each walk we made two steps. ure 1, or the verbs produces, damaged. For our purposes, mathematics terminology consists of words that expresses For two math vectors considered as the sets A and B, cosine concepts that are needed for the mathematical competence similarity can be conceptualized in terms of the intersection the curriculum addresses. To identify these terms, we devel- set C = A ∪ B and set differences A \ B and B \ A. Cosine oped annotation guidelines for human annotators who label similarity is high when |C|  A \ B and |C|  B \ A. words in their contexts of use, and assessed the reliability of annotation by these guidelines. Words can be used in the The degree of a node affects the probability of traversing math texts sometimes in a math sense and sometimes in a any edge from that node. The two factors that affect de- non-math sense. Annotators were instructed to label terms gree centrality of a start node are the document frequencies based on the most frequent usage. of its math words, and the total number of math words. Here, document frequency (df) is the normalized number of Using a chance-adjusted agreement coefficient in [-1,1] [9], exercises a word occurs in. A high df math word in a prob- reliability among three annotators was 0.81, representing lem increase its degree centrality because there will be more Figure 2: Exercise-to-Exercise similarity in Chapter 6. The exercises of Chapter 6 are displayed in row-columns in a square matrix. The rows represent source nodes and columns represent targets. Each row has been normalized across the book, even though only Chapter 6 is shown. The axes demarcate the sections of the chapter. Mcos is the cosine similarity. Mcosrw is the output from the random walk using cosine similarity edge weights, Mbp is the output from the random bipartite walk. Raw values displayed between 0 and 0.005 corresponding to light and dark pixels, respectively. problems it can share words with, resulting in non-zero co- the distribution across the row as a probability distribution sine values and therefore edges. The number of math words to all other nodes for that source node. in a problem also increases its degree centrality. 4. RESULTS 3.5 Bipartite exercise and word graph We compare the three measures of similarity between ex- The set of exercises Ve are the left-side nodes and the math ercises: 1) cosine similarity, 2) random walks using cosine words Vw are the right-side nodes in the undirected bipartite similarity as edge weights, and 3) random walks along a bi- graph G = (Ve , Vw , E), where an edge exists between vex and partite graph of exercises and words. vwi if exercise x contains the math word i. We performed open random walks on this graph to measure 4.1 Exercise-to-Exercise Similarity We describe exercise-to-exercise similarity with square ma- similarity between nodes. To measure the similarity of ex- trices where each exercise is represented as a row-column. A ercises, we walk in even steps – a step to a connected word number of features of the measures are embedded in Figure followed by a step back to one of the exercises that shares 2, which shows heatmaps of color values for pairs of exercises that word. The degrees of separation between vertices on in chapter 6 for each matrix. We find that within chapters the same side of the graph (e.g. exercise-to-exercise) will be and especially within sections of those chapters, there is a l/2 where l is the length of the walk. In this paper, we ex- high degree of similarity between exercises regardless of the plored first and second degrees of separation so our bipartite measure. This demonstrates that words within sections and graphs had a walk length of 4. chapters share a common vocabulary. We can see that Mcos Table 1: Summary statistics of the similarity distributions has more extreme values than Mcosrw ; as explained below, it has both more zero cosine values, and more very high cosine rwcos rwbp values. This is most likely because Mcosrw , from doing the minimum 0 0 0 walk, picks up exercises that are another degree of separa- maximum 6.3 ×10−2 0.50 0.11 tion away. When the row of the matrix is normalized to mean 5.55 ×10−4 5.55 ×10−4 5.55 ×10−4 capture the distribution of the source node, the otherwise median 0 2.41 ×10−4 2.06 ×10−4 high values from Mcos are tempered in the Mcosrw matrix. std. dev. 1.19 ×10−3 8.57 ×10−4 1.24 ×10−3 This shift to a large number of lower scores is shown in the bottom panel of Figure 3. Mbp and Mcosrw are very similar, Because exercise nodes are connected via word nodes, we in- but Mbp generally has a wider dynamic range. terpret the fraction of node visits as a similarity measure be- tween the source node and any node visited. We performed 4.2 Comparison of the Graph Walks 100,000 random walks from each node. Exercise-to-exercise Table 1 provides summary statistics for cosine similarity and similarity can be visualzed as square matrices with source the two random walks for all pairs of problems (N=3,250,809). nodes in the rows and target nodes in the columns. To fac- The cosine matrix is very sparse, as shown by the median tor out the times a source may have been selected as one of value of 0. Of the two random walk similarities, rwcos has the targets, we set the diagonal of the matrix to zero. We a lower standard deviation around the mean, but otherwise then normalized across the rows so that we could interpret the two random walks produce similar distributions. The similarity values given by cosine and the cosine random Table 3: Two pairs of problems with high cosine similarity walk will increasingly differ the more that the start problem and reverse patterns of graph walk similarity. The first pair, has relatively higher degree centrality due either to more from section 5, have lower than average rwcos and higher words or higher frequency of words in exercises (df). For than average rwbp due to relatively many words in common reference, the word that occurs most frequently, number, (12 and 14). The second pair, from section 6, have higher has a df of 0.42, and the second most frequent occurs in than average rwcos and lower than average rwbp due to rel- only 15% of the exercises. Fifty eight nodes have no edges atively few words in common. (0 degree), the most frequent number of edges is 170, and the maximum is 1,706. Table 2 gives the summary statistics Prob 1 Prob 2 cosine rwcos rwbp N max df for df, number of math words, and degree centrality. 6.5.99 6.5.85 1.0000 0.0032 0.0102 12 0.42 6.5.94 6.5.83 0.8819 0.0026 0.0064 14 0.13 Inspection of the data shows that for pairs of problems in 6.6.109 6.6.102 0.8660 0.0068 0.0037 3 0.11 the two chapters for our case study, if the cosine similar- 6.6.104 6.6.102 0.7746 0.0068 0.0029 3 0.11 ity between a pair is high (≥ 0.75), the similarity values for rwcos tend to go down as the number of shared word increases from 3 to between 5 and 7. For the rwbp , the cosine similarity is less than 0.20, the two walks produce opposite trend occurs, where the similarity goes up as the very similar results. The average similarity values for the number of words increases. This difference helps account for bipartite walk are about 20% higher, and the maximum val- an observed divergence in the two graph walks for sections ues are higher, but the two walks produce similar means, 5 and 6 of Chapter 6. independent of the lengths of the common word vectors, or the total number of math words. Table 3 illustrates two pairs of problems from section 5 that have high cosine similarities, and relatively higher rwbp sim- Since we normalized the matrices across rows, which are ilarities (greater than the rw means of 0.0055) and relatively the source nodes, differences between the bipartite matrix, lower rwcos (lower than the rw means). The reverse pattern Mbp , and the cosine matrices implied that the degree of the is seen for two pairs of problems from section 6 that have target node had a greater impact on the variability in the high cosine similarities. These problems have higher than bipartite matrix. To measure the impact of the edge degree average rwcos and lower than average rwbp . What differen- on the target nodes, we considered the column sum for those tiates the two pairs of problems is that the section 5 prob- targets that had 1 edge, those that had 2, etc. up to 20 lems have a relatively large number of words in common: edges. The results are summarized in Figure 4. As can be 14 for the first pair, 12 for the second pair. In both pairs, seen, the column sum varies linearly by the number of target some of the words have relatively high document frequency. edges in the bipartite matrix, whereas the cosine matrices As discussed above, these two properties increase the degree do not. We found the cubed root of the column sum in Mbp centrality of the start node of a step in the rwcos graph, and approaches the distribution of column sums of the cosine thus lower the probability of hitting each of the start node’s matrices, which is provided in Figure 4. one-degree neighbors. This effect propagates along the two steps of the walk. For the rwbp graph, however, as the num- ber of shared math words for a pair of problems increases, the number of paths from one to the other also increases, thus raising the probability of the traversal. This effect also propagates through a two-step walk. In contrast to the sec- tion 5 problems, the two section 6 problems have relatively fewer words in common: 3 for both pairs. For problem pairs where the cosine similarity is between 0.40 and 0.60, the mean similarity from rwbp is 30% higher than for rwcos for when the number of math words in com- mon is 3 (0.0033 vs. 0.0043), 80% higher when the number of math words in common is 6 (0.0024 versus 0.0045), and Figure 3: Tail distribution of similarity values in Mcos , three as high when the number of math words in common Mcosrw , and Mbp . Because 62% of the values in Mcos are 0, is 9 (0.0023 versus 0.0068). For problems pairs where the the plot shows only non-zero values. 5. CONCLUSION Table 2: Summary statistics of document frequency (df) of Visualization of the three similarity matrices shows they re- math words, number of math words in problems, and degree veal the same overall patterns, thus each is confirmed by centrality of rwcos the others. However, the bipartite walk was the most sen- sitive to word frequency across exercises, and the number df math words degree ctr. rwcos of words in problems. With our goal of automatically dis- minimum 5.54 ×10−4 1 0 covering knowledge components and identifying their rela- maximum 0.424 24 1,706 tionships, the random walk that stepped in proportion to mean 0.183 8.35 418.4 its cosine similarity performed best. It was able to discover median 6.66 ×10−3 8.00 340 second-degree relationships that seem reasonable as we ex- std. dev. 0.314 3.64 317.1 plore by eye those matches. Future work will test these re- knowledge-learning-instruction (KLI) framework: Toward bridging the science-practice chasm to enhance robust student learning. Cognitive Science, 36(5):757–798, 2012. [9] K. Krippendorff. Content analysis: An introduction to its methodology. Sage Publications, Beverly Hills, CA, 1980. [10] N. Kushman, Y. Artzi, L. Zettlemoyer, and R. Barzilay. Learning to automatically solve algebra word problems. In Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 271–281, Baltimore, Maryland, June 2014. Association for Figure 4: Distribution of column sums by number of edges Computational Linguistics. in the target node represented by the column. Error plots show the mean and standard error for each type. Black line [11] N. Li, W. W. Cohen, and K. R. Koedinger. is the cubed root of the mean of the column sums of Mbp . Discovering student models with a clustering algorithm using problem content. In Proceedings of the 6th International Conference on Educational Data lationships with student performance data. We should find, Mining, 2013. for example, that if two exercises are conceptually similar, [12] E. Minkov, W. W. Cohen, and A. Y. Ng. Contextual then student outcomes should also be similar and learning search and name disambiguation in email using curves should reveal shared knowledge components. In this graphs. In Proceedings of the 29th Annual respect, such automatically constructed knowledge graphs International ACM SIGIR Conference on Research can create more refined domain models that intelligent tu- and Development in Information Retrieval, SIGIR ’06, toring systems and robust assessments can be built upon. pages 27–34, New York, NY, USA, 2006. ACM. [13] I. Niles and A. Pease. Mapping WordNet to the 6. REFERENCES SUMO ontology. In Proceedings of the IEEE [1] Y. An, J. Janssen, and E. E. Milios. Characterizing International Knowledge Engineering Conference, and mining the citation graph of the computer science pages 23–26, 2003. literature. Knowledge and Information Systems, [14] J. Otterbacher, G. Erkan, and D. R. Radev. Biased 6(6):664–678, 2004. lexrank: Passage retrieval using random walks with [2] J. Berant, V. Srikumar, P.-C. Chen, question-based priors. Information Processing and A. Vander Linden, B. Harding, B. Huang, P. Clark, Management, 45(1):42–54, 2009. and C. D. Manning. Modeling biological processes for [15] L. Page, S. Brin, R. Motwani, and T. Winograd. The reading comprehension. In Proceedings of the 2014 pagerank citation ranking: Bringing order to the web. Conference on Empirical Methods in Natural Language Technical Report 1999-66, Stanford InfoLab, Processing (EMNLP), pages 1499–1510, Doha, Qatar, November 1999. Previous number = October 2014. Association for Computational SIDL-WP-1999-0120. Linguistics. [16] T. M. Pilehvar, D. Jurgens, and R. Navigli. Align, [3] P. Buitelaar, A. Frank, M. Hartung, and S. Racioppa. disambiguate and walk: A unified approach for Ontology-based information extraction and integration measuring semantic similarity. In Proceedings of the from heterogeneous data sources, 2008. 51st Annual Meeting of the Association for [4] A. Carlson, J. Betteridge, B. Kisiel, B. Settles, E. H. Computational Linguistics (Volume 1: Long Papers), Jr., and T. M. Mitchell. Toward an architecture for pages 1341–1351. Association for Computational never-ending language learning. In Proceedings of the Linguistics, 2013. 24th Conference on Artificial Intelligence (AAAI), [17] J. Scheffczyk, A. Pease, and M. Ellsworth. Linking volume 2, pages 1306–1313, 2010. FrameNet to the suggested upper merged ontology. In [5] G. Erkan and D. R. Radev. Lexrank: Graph-based B. Hennett and C. Fellbaum, editors, Formal Ontology lexical centrality as salience in text summarization. J. in Information Systems, pages 289–. IOS Press, 2006. Artif. Int. Res., 22(1):457–479, Dec. 2004. [18] M. J. Seo, H. Hajishirzi, A. Farhadi, and O. Etzioni. [6] B. Hixon, P. Clark, and H. Hajishirzi. Learning Diagram understanding in geometry questions. In knowledge graphs for question answering through Proceedings of the 28th AAAI Conference on Artificial conversational dialog. In Proceedings of the 2013 Intelligence, 2014. Conference of the North American Chapter of the [19] M. Šimko and M. Bieliková. Automatic concept Association for Computational Linguistics: Human relationships discovery for an adaptive e-course. In Language Technologies, Denver, CO, May-June 2015. Proceedings of the Second International Conference on [7] R. J. L. John, R. J. Passonneau, and T. S. McTavish. Educational Data Mining (EDM), pages 171–179, Semantic similarity graphs of mathematics word 2009. problems: Can terminology detection help? In Proceedings of the Eighth International Conference on Educational Data Mining, 2015. [8] K. R. Koedinger, A. T. Corbett, and C. Perfetti. The