Graph-based Educational Data Mining (G-EDM 2015) Collin F. Lynch Dr. Tiffany Barnes Department of Computer Department of Computer Science Science North Carolina State North Carolina State University University Raleigh, North Carolina Raleigh, North Carolina cflynch@ncsu.edu tmbarnes@ncsu.edu Dr. Jennifer Albert Michael Eagle Department of Computer Science Department of Computer North Carolina State University Science Raleigh, North Carolina North Carolina State jlsharp@ncsu.edu University Raleigh, North Carolina mjeagle@ncsu.edu 1. INTRODUCTION Thus, graphs are simple in concept, general in structure, and Fundamentally, a graph is a simple concept. At a basic level a have wide applications for Educational Data Mining (EDM). graph is a set of relationships {e(n0 ,n2 ),e(n0 ,nj ),...,e(nj−1 ,nj )} Despite the importance of graphs to data mining and data anal- between elements. This simple concept, however, has afforded the ysis there exists no strong community of researchers focused on development of a complex theory of graphs [1] and rich algorithms Graph-Based Educational Data Mining. Such a community is for combinatorics [7] and clustering [4]. This has, in turn, made important to foster useful interactions, share tools and techniques, graphs a fundamental part of educational data mining. and to explore common problems. Many types of data can be naturally represented as graphs such 2. GEDM 2014 as social network data, user-system interaction logs, argument This is the second workshop on Graph-Based Educational Data diagrams, logical proofs, and forum discussions. Such data has Mining. The first was held in conjunction with EDM 2014 in grown exponentially in volume as courses have moved online and London [17]. The focus of that workshop was on seeding an initial educational technology has been incorporated into the traditional community of researchers, and on identifying shared problems, and classroom. Analyzing it can help to answer a range of important avenues for research. The papers presented covered a range of top- questions such as: ics including unique visualizations [13], social capital in educational networks [8], graph mining [19, 11], and tutor construction [9]. • What path(s) do high-performing students take through online educational materials? The group discussion sections at that workshop focused on the • What social networks can foster or inhibit learning? distinct uses of graph data. Some of the work presented focused • Do users of online learning tools behave as the system designers on student-produced graphs as solution representations (e.g. [14, expect? 3]) while others focused more on the use of graphs for large-scale • What diagnostic substructures are commonly found in student- analysis to support instructors or administrators (e.g. [18, 13]). produced diagrams? These differing uses motivate different analytical techniques and, • Can we use prior student data to identify students’ solution as participants noted, change our underlying assumptions about plan, if any? the graph structures in important ways. • Can we use prior student data to provide meaningful hints in complex domains? • Can we identify students who are particularly helpful based 3. GEDM 2015 upon their social interactions? Our goal in this second workshop was to build upon this nascent community structure and to explore the following questions: 1. What common goals exist for graph analysis in EDM? 2. What shared resources such as tools and repositories are re- quired to support the community? 3. How do the structures of the graphs and the analytical methods change with the applications? The papers that we include here fall into four broad categories: interaction, induction, assessment, and MOOCs. Work by Poulovassilis et al. [15] and Lynch et al. [12] focuses Data Mining 2014, co-located with 7th International on analyzing user-system interactions in state based learning Conference on Educational Data Mining (EDM environments. Poulovassilis et al. focuses on the analyses of 2014), London, United Kingdom, July 4-7, 2014., volume individual users’ solution paths and presents a novel mechanism 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. to query solution paths and identify general solution strategies. [4] M. Girvan and M. E. J. Newman. Community Lynch et al. by contrast, examined user-system interactions from structure in social and biological networks. Proc. of the existing model-based tutors to examine the impact of specific National Academy of Sciences, 99(12):7821–7826, June 2002. design decisions on student performance. [5] J. Guerra. Graph analysis of student model networks. In C. F. Lynch, T. Barnes, Price & Barnes [16] and Hicks et al. [6] focus on applying these J. Albert, and M. Eagle, editors, Proceedings of the Second same analyses in the open-ended domain of programming. Unlike International Workshop on Graph-Based Educational Data more discrete tutoring domains where users enter single equations Mining (GEDM 2015). CEUR-WS, June 2015. (in press). or select actions, programming tutors allow users to make drastic [6] A. Hicks, V. Catete, R. Zhi, changes to their code on each step. This can pose challenges for Y. Dong, and T. Barnes. Bots: Selecting next-steps from data-driven methods as the student states are frequently unique player traces in a puzzle game. In C. F. Lynch, T. Barnes, and admit no easy single-step advice. Price and Barnes present a J. Albert, and M. Eagle, editors, Proceedings of the Second novel method for addressing the data sparsity problem by focusing International Workshop on Graph-Based Educational Data on minimal-distance changes between users [16] while in related Mining (GEDM 2015). CEUR-WS, June 2015. (in press). work Hicks et al. focuses on the use of path weighting to select [7] D. E. Knuth. The actionable advice in a complex state space [6]. Art of Computer Programming: Combinatorial Algorithms, Part 1, volume 4A. Addison-Wesley, 1st edition, 2011. The goal in much of this work is to identify rules that can [8] V. Kovanovic, S. Joksimovic, D. Gasevic, and M. Hatala. be used to characterize good and poor interactions or good and What is the source of social capital? the association poor graphs. Xue at al. sought address this challenge in part via between social network position and social presence the automatic induction of graph rules for student-produced dia- in communities of inquiry. In S. G. Santos and O. C. Santos, grams [22]. In their ongoing work they are applying evolutionary editors, Proceedings of the Workshops held at Educational computation to the induction of Augmented Graph Grammars, Data Mining 2014, co-located with 7th International a graph-based formalism for rules about graphs. Conference on Educational Data Mining (EDM 2014), London, United Kingdom, July 4-7, 2014., volume The work described by Leo-John et al. [10], Guerra [5] and 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. Weber & Vas [21], takes a different tack and focuses not on graphs [9] R. Kumar. Cross-domain performance of automatic tutor representing solutions or interactions but on relationships. Leo- modeling algorithms. In S. G. Santos and O. C. Santos, John et al. present a novel approach for identifying closely-related editors, Proceedings of the Workshops held at Educational word problems via semantic networks. This work is designed to Data Mining 2014, co-located with 7th International support content developers and educators in examining a set of Conference on Educational Data Mining (EDM questions and in giving appropriate assignments. Guerra takes 2014), London, United Kingdom, July 4-7, 2014., volume a similar approach to the assessment of users’ conceptual changes 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. when learning programming. He argues that the conceptual [10] R.-J. Leo-John, T. McTavish, and R. Passonneau. relationship graph affords a better mechanism for automatic as- Semantic graphs for mathematics word problems based sessment than individual component models. This approach is on mathematics terminology. In C. F. Lynch, T. Barnes, also taken up by Weber and Vas who present a toolkit for graph- J. Albert, and M. Eagle, editors, Proceedings of the Second based self-assessment that is designed to bring these conceptual International Workshop on Graph-Based Educational Data structures under students’ direct control. Mining (GEDM 2015). CEUR-WS, June 2015. (in press). And finally, Vigentini & Clayphan [20], and Brown et al. [2] [11] C. F. Lynch. AGG: augmented graph grammars for complex focus on the unique problems posed by MOOCs. Vigentini and heterogeneous data. In S. G. Santos and O. C. Santos, Clayphan present work on the use of graph-based metrics to editors, Proceedings of the Workshops held at Educational assess students’ on-line behaviors. Brown et al., by contrast, focus Data Mining 2014, co-located with 7th International not on local behaviors but on social networks with the goal of Conference on Educational Data Mining (EDM identifying stable sub-communities of users and of assessing the 2014), London, United Kingdom, July 4-7, 2014., volume impact of social relationships on users’ class performance. 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [12] C. F. Lynch, T. W. Price, M. Chi, and T. Barnes. Using the hint factory to analyze 4. REFERENCES model-based tutoring systems. In C. F. Lynch, T. Barnes, [1] B. Bollobás. J. Albert, and M. Eagle, editors, Proceedings of the Second Modern Graph Theory. Springer Science+Business International Workshop on Graph-Based Educational Data Media Inc. New York, New York, U.S.A., 1998. Mining (GEDM 2015). CEUR-WS, June 2015. (in press). [2] R. Brown, C. F. Lynch, Y. Wang, [13] T. McTavish. Facilitating graph interpretation via interactive M. Eagle, J. Albert, T. Barnes, R. Baker, Y. Bergner, hierarchical edges. In S. G. Santos and O. C. Santos, and D. McNamara. Communities of performance editors, Proceedings of the Workshops held at Educational & communities of preference. In C. F. Lynch, T. Barnes, Data Mining 2014, co-located with 7th International J. Albert, and M. Eagle, editors, Proceedings of the Second Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014., volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [3] R. Dekel and K. Gal. On-line plan recognition in exploratory [14] B. Mostafavi and T. Barnes. Evaluation of logic proof problem learning environments. In S. G. Santos and O. C. Santos, difficulty through student performance data. In S. G. Santos editors, Proceedings of the Workshops held at Educational and O. C. Santos, editors, Proceedings of the Workshops Data Mining 2014, co-located with 7th International held at Educational Data Mining 2014, co-located with 7th Conference on Educational Data Mining (EDM International Conference on Educational Data Mining (EDM 2014), London, United Kingdom, July 4-7, 2014., volume 2014), London, United Kingdom, July 4-7, 2014., volume 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [19] K. Vaculı́k, L. Nezvalová, and L. Popelı́nsky. Graph mining [15] A. Poulovassilis, S. G. Santos, and M. Mavrikis. Graph-based and outlier detection meet logic proof tutoring. In S. G. Santos modelling of students’ interaction data from exploratory and O. C. Santos, editors, Proceedings of the Workshops learning environments. In C. F. Lynch, T. Barnes, held at Educational Data Mining 2014, co-located with 7th J. Albert, and M. Eagle, editors, Proceedings of the Second International Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014., volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [16] T. Price and T. Barnes. An [20] L. Vigentini and A. Clayphan. Exploring the function exploration of data-driven hint generation in an open-ended of discussion forums in moocs: comparing data mining programming problem. In C. F. Lynch, T. Barnes, and graph-based approaches. In C. F. Lynch, T. Barnes, J. Albert, and M. Eagle, editors, Proceedings of the Second J. Albert, and M. Eagle, editors, Proceedings of the Second International Workshop on Graph-Based Educational Data International Workshop on Graph-Based Educational Data Mining (GEDM 2015). CEUR-WS, June 2015. (in press). Mining (GEDM 2015). CEUR-WS, June 2015. (in press). [17] S. G. Santos and O. C. Santos, [21] C. Weber and R. Vas. Studio: Ontology-based editors. Proceedings of the Workshops held at Educational educational self-assessment. In C. F. Lynch, T. Barnes, Data Mining 2014, co-located with 7th International J. Albert, and M. Eagle, editors, Proceedings of the Second Conference on Educational Data Mining (EDM International Workshop on Graph-Based Educational Data 2014), London, United Kingdom, July 4-7, 2014, volume Mining (GEDM 2015). CEUR-WS, June 2015. (in press). 1183 of CEUR Workshop Proceedings. CEUR-WS.org, 2014. [22] L. Xue, C. F. Lynch, and M. Chi. Graph grammar induction [18] V. Sheshadri, C. Lynch, and T. Barnes. by genetic programming. In C. F. Lynch, T. Barnes, Invis: An EDM tool for graphical rendering and analysis of J. Albert, and M. Eagle, editors, Proceedings of the Second student interaction data. In S. G. Santos and O. C. Santos, International Workshop on Graph-Based Educational Data editors, Proceedings of the Workshops held at Educational Mining (GEDM 2015). CEUR-WS, June 2015. (in press).