Toward Integrating Cognitive Tutor Interaction Data with Human Tutoring Text Dialogue Data in LearnSphere Stephen E. Fancsali Michael Yudelson Vasile Rus Steven Ritter Human-Computer Interaction Institute Donald M. Morrison Susan R. Berman Carnegie Mellon University Institute for Intelligent Systems & Carnegie Learning, Inc. 5000 Forbes Avenue Department of Computer Science 437 Grant Street, Suite 1906 Pittsburgh, PA 15213, USA University of Memphis Pittsburgh, PA 15219, USA myudelson@cs.cmu.edu Memphis, TN, USA sfancsali@carnegielearning.com vrus@memphis.edu sritter@carnegielearning.com chipmorrison@gmail.com sberman@carnegielearning.com ABSTRACT help affordances of an ITS). Such insights may also guide We present details of a large, novel dataset that includes both technical design and feature additions for ITSs like the Cognitive Carnegie Learning Cognitive Tutor interaction data and various Tutor (e.g., an online recommendation system based on cognitive data related to learner interactions with human tutors via an online and non-cognitive factors that drives a student to the right kind of chat while using the Cognitive Tutor. We discuss integrating these help, whether human or automated). two data modalities within LearnSphere and propose workflows The multi-modal data we consider in IHATS provide traditional (and corresponding analyses) germane to our U.S. Department of CT ITS learner interaction data of the sort analyzed by a wide Defense Advanced Distributed Learning Initiative-funded variety of educational data mining studies and stored in the Integrating Human and Automated Tutoring Systems (IHATS) Pittsburgh Science of Learning Center’s LearnLab DataShop project, demonstrating various aspects of the potential of the format [9]. In addition to process data about student interactions LearnSphere framework. with the CT, we have both chat transcripts for student interactions with human tutors as well as labels that describe various aspects Keywords of these tutoring sessions, which we will now explain. Intelligent tutoring systems, Cognitive Tutor, DataShop, Our approach to processing tutoring chat transcript data for LearnSphere, multi-modal data, human tutoring, mathematics analysis has been to use human annotators to provide labels, over education a sample of 500 transcripts of human tutoring chat sessions, for specific dialogue acts [13] and dialogue modes that occur within 1. INTRODUCTION these sessions as well as overall assessments of the “educational The Integrating Human and Automated Tutoring Systems soundness” and learning effectiveness of these sessions. Machine (IHATS) project is a U.S. Department of Defense Advanced learning classifiers are trained on these human-annotated labels, Distributed Learning Initiative-funded venture led by Carnegie and these models are then applied to the large sample of Learning, Inc., in partnership with researchers from the University approximately 19,000 transcripts to provide meta-data about each of Memphis and Carnegie Mellon University. IHATS leverages a human tutoring chat session. unique dataset collected over the period of June to December LearnSphere provides a novel and innovative platform for storage 2014 from adult learners in two higher education (developmental) and analysis of these multi-modal educational data. The present algebra courses that featured Carnegie Learning’s Cognitive Tutor paper describes our initial approach to representing these multi- (CT) [11] intelligent tutoring system (ITS) for mathematics and modal data in LearnSphere, details several of the analyses we provided students with the ability to seek human tutoring via an intend to pursue, and proposes LearnSphere workflows that will online chat mechanism. enable such analyses as well as future analyses. The goal of the IHATS project is to provide insights into what cognitive factors (e.g., error rates) and non/meta-cognitive factors 2. WORKFLOW METHOD (e.g., behavior like “gaming the system” [1-3] and affective states like boredom [5]) are likely to drive students to seek out human 2.1 Data Inputs tutoring via the available chat mechanism and what predicts if that Cognitive Tutor usage data are comprised of approximately 88 tutoring is likely to be successful in driving improved learning million actions (i.e., DataShop transactions) from nearly 5,000 outcomes. Insight into predictors and possible determinants of learners using CT in target courses. These data are represented in human tutoring use as well as effectiveness of such use will the PSLC DataShop MySQL format. Additionally, we have (and provide a foundation for studying “hand-offs” between automated will have) various annotations (i.e., labels) for each of the human and human tutoring and instructional modalities. Insights into tutoring chat sessions in which learners participated that will be instructional and tutoring hand-offs may have a variety of provided as custom fields in the cf_tx_level_big table in the practical implications. For example, such insights can inform DataShop MySQL format. Annotations and tags for each tutoring classroom/instructional best practices (e.g., teachers having an chat session will be associated with entries (i.e., the ability to guide students as to when it is best for them to ask their fellow students or the teacher for help versus working with the transaction_id) in the cf_tx_level_big for the CT transaction that requests before a learner’s first interaction with a human tutor or immediately precedes the beginning of the chat session.1 on the problem the learner was working as she initialized a human tutoring chat session). Features used by existing detectors capture Various columns of the tutor_transaction table and several related facets of fine-grained, transaction-level data that include (but are tables would be used in the workflows we propose, including, but not limited to): speed with which actions are taken after making at not limited to, outcome, duration, subgoal_id, and problem_id, as least one error on a problem-solving step in the CT (to detect well as columns from tables including session, subgoal, and skill gaming the system [1-3]), the maximum number of incorrect that provide information about the particular knowledge actions or hint requests for any particular skill within a components (KCs) or skills to which student actions/transactions “clip”/period of problem-solving time (to detect boredom [5]), are mapped. and many others. Engineered features can then be provided as input to a variety of statistical and machine learning models, 2.2 Workflow Model including those that comprise existing detector models, but LearnSphere provides an intuitive user interface and analytics simpler options like linear regression may also be useful. affordances that may assist in a variety of analyses in the IHATS To enable multi-modal analysis, feature engineering from text project. The general goal of our workflow(s) is to produce models data like our chat session transcripts would likely be helpful for a of meta-/non-cognitive factors for learners in our dataset, variety of possible analyses. The goal of the IHATS project will including machine-learned models (i.e., “detectors”) of gaming be to combine results of machine learned models applied to text the system behavior [1-3], off-task behavior [4], and affective data (i.e., to classify dialogue acts, sub-acts, and modes) to states like boredom, frustration, and engaged concentration [5]. determine characteristics of the human tutoring interaction that are Such detector models have proven fruitful in illuminating various associated with improved performance in the CT. Other possibly associations and possible causal relationships among meta-/non- important, text-based features could also be extracted (e.g., cognitive factors as well as between such factors and learning, for content-related features such as how many content words or example, between gaming the system behavior and final course domain/topic specific words the student or tutor generated as in exam scores, in a similar population of adult learners using CT in [12]). Affordances within LearnSphere workflows could be similar algebra courses [8]. These results make us hopeful that designed to allow for such analyses using features engineered such meta-/non-cognitive factors, in addition to cognitive factors from both CT usage data and text modalities. like hint use, might help predict whether students turn to human tutoring from the CT and whether/if human tutoring tends to be especially successful under certain conditions. 2.3 Workflow Outputs A variety of possible outputs can result from our proposed One possible, omnibus workflow would begin with raw DataShop workflow(s). BKT parameters for KCs estimated from our large format data and output the final results of aforementioned detector dataset of approximately 88 million learner actions may be of use models (built for a specific product like CT Algebra), including to EDM and learning analytics researchers, and those models can transaction-level predictions of whether particular actions are be evaluated by a variety of metrics available within likely to be instances of gaming the system or off-task behavior or DataShop/LearnSphere. Transformed datasets with columns whether learners are likely to be experiencing specific affective corresponding to features engineered within the workflow (and states in particular “clips” (i.e., time intervals) of CT usage. rows corresponding to appropriate units of analysis/aggregation) However, such a workflow can (should) be broken down into its can be exported for use in statistical and other software tools. (more general) constituent components so that elements of the Specific tools for the EDM community, including detector models workflow can be generalized to other products, environments, and of behavior and affect (based on engineered features and models settings, including other ITSs and educational technologies. For parameterized/estimated for particular products and systems), example, Bayesian Knowledge Tracing (BKT) [7] parameter could also be included as affordances to workflows. Assuming estimates for knowledge components in our dataset are required general analytical tools are incorporated into LearnSphere, inputs to later steps in “building” detector models, and statistical models about relationships among such learner behavior LearnSphere already supports a workflow to learn BKT parameter and affect (i.e., the output of detectors) could also be the output of estimates from data. a workflow (i.e., using the results of particular models as input to Detector models for CT also rely on features engineered from other models in a “discovery with models” approach [6, 8]). For fine-grained transaction level data. One component of the the IHATS project, we are likely to pursue modeling within the LearnSphere suite of tools should include the ability to easily framework of algorithmic search for graphical causal models [14] generate such features and enable researchers to easily specify to uncover possible causal relationships among particular new features to be engineered/specified from fine-grained data. behaviors, affective states, learner interactions with human Ideally, features could be engineering to range over a variety of tutoring, and learning. This approach has been fruitfully applied in levels of aggregation and units of analysis, including at the analyses of CT data from a similar population [8] as well as in transaction, problem, session, and student level (e.g., enabling other EDM and learning analytics studies (e.g., [10], among calculation of average transaction, problem, or session time as others). Such search techniques could, in principle, be made well as total student time) and over different time spans (e.g., hint available within LearnSphere workflows (among a bevy of other statistical and machine learning techniques) to provide a “one- stop-(Data)Shop” for applying such analyses to learner data. 1 While contractual restrictions forbid us from releasing raw tutoring chat session transcripts, the distributed nature of LearnSphere will allow us to locally (and thus privately) 3. DISCUSSION integrate even the chat transcripts, rather than just LearnSphere provides an exciting opportunity to make a variety of annotations/tags associated with these transcripts, within a general interest workflows available to the broader EDM single DataShop style database. community of researchers. As we have outlined above, investigators’ choice(s) of workflows (and components thereof) [5] Baker, R.S.J.d., Gowda, S.M., Wixon, M., Kalka, J., Wagner, are likely to depend on their scientific interests and goals. Rich A.Z., Salvi, A., Aleven, V., Kusbit, G.W., Ocumpaugh, J., learner data from environments like ITSs, educational games, and Rossi, L. 2012. Towards sensor-free affect detection in MOOCs can be readily used to make progress on a variety of Cognitive Tutor Algebra. In Proceedings of the 5th questions about cognitive modeling and data-driven International Conference on Educational Data Mining improvements to student/KC models as well as to questions about (Chania, Greece, June 19-21, 2012). International relationships between cognitive, non-cognitive, and meta- Educational Data Mining Society, 126-133. cognitive factors at play as learners interact with such systems. [6] Baker, R.S.J.d., Yacef, K. 2009. The state of educational data LearnSphere pushes the boundaries of the types of multi-modal mining in 2009: a review and future visions. Journal of data with which researchers will be able to more easily work, Educational Data Mining 1 (2009), 3-17. including human tutoring chat transcripts (and corresponding meta-data) in the IHATS project, to pursue innovative research in [7] Corbett, A.T., Anderson, J.R. 1995. Knowledge tracing: the learning sciences. LearnSphere’s making analyses of such Modeling the acquisition of procedural knowledge. User learner data more readily generalizable and replicable will be a Model. User-Adap. 4 (1995), 253-278. great service to the learning sciences and EDM community. [8] Fancsali, S.E. 2014. Causal Discovery with Models: Behavior, Affect, and Learning in Cognitive Tutor Algebra. 4. ACKNOWLEDGMENTS In Proceedings of the 7th International Conference on The IHATS project is funded by the U.S. Department of Defense, Educational Data Mining (London, UK, July 4-7, 2014) Advanced Distributed Learning Initiative, Contract W911QY-15- International Educational Data Mining Society, 28-35. C-0070. [9] Koedinger, K.R., Baker, R.S.J.d., Cunningham, K., Skogsholm, A., Leber, B., Stamper, J. 2011. A data 5. REFERENCES repository for the EDM community: the PSLC DataShop. In Handbook of Educational Data Mining, C. Romero, S. [1] Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Evenson, E., Ventura, M. Pechenizkiy, & R.S.J.d. Baker, Eds. CRC, Boca Roll, I., Wagner, A.Z., Naim, M., Raspat, J., Baker, D.J., Raton, FL. Beck, J. 2006. Adapting to when students game an intelligent tutoring system. In Proceedings of the 8th International [10] Koedinger, K.R., Kim, J., Jia, J.Z., McLaughlin, E.A., Bier, Conference on Intelligent Tutoring Systems (Jhongli, Taiwan, N.L. 2015. Learning is not a spectator sport: Doing is better 2006). 392-401. than watching for learning from a MOOC. In Proceedings of the Second (2015) ACM Conference on Learning@Scale [2] Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z. (Vancouver, Canada, March 14-15, 2015). ACM, New York, 2004. Off-task behavior in the Cognitive Tutor classroom: 111-120. when students "game the system." In Proceedings of ACM CHI 2004: Computer-Human Interaction (Vienna, Austria, [11] Ritter, S., Anderson, J.R., Koedinger, K.R., Corbett, A.T. 2004). 383-390. 2007. Cognitive Tutor: applied research in mathematics education. Psychon. B. Rev. 14 (2007), 249-255. [3] Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R. 2008. Developing a generalizable detector of when students [12] Rus, V., Stefanescu, D. 2016. Non-intrusive assessment of game the system. User Model. User-Adap. 18 (2008), 287- learners’ prior knowledge in dialogue-based intelligent 314. tutoring systems. International Journal of Smart Learning Environments 3 (2016), 1-18. [4] Baker, R.S.J.d. 2007. Modeling and understanding students’ off-task behavior in intelligent tutoring systems. In [13] Searle, J.R. 1969. Speech Acts: An Essay in the Philosophy of Proceedings of ACM CHI 2007: Computer-Human Language. Cambridge: Cambridge UP. Interaction (San Jose, CA, April 28 – May 3, 2007). ACM, [14] Spirtes, P., Glymour, C., Scheines, R. 2000. Causation, New York, 1059-1068. Prediction, and Search. 2nd Edition. MIT, Cambridge, MA.