Toward Integrating Cognitive Tutor Interaction Data with
    Human Tutoring Text Dialogue Data in LearnSphere
         Stephen E. Fancsali                               Michael Yudelson                             Vasile Rus
            Steven Ritter                        Human-Computer Interaction Institute                Donald M. Morrison
          Susan R. Berman                           Carnegie Mellon University                 Institute for Intelligent Systems &
         Carnegie Learning, Inc.                        5000 Forbes Avenue                     Department of Computer Science
      437 Grant Street, Suite 1906                  Pittsburgh, PA 15213, USA                        University of Memphis
       Pittsburgh, PA 15219, USA                       myudelson@cs.cmu.edu                            Memphis, TN, USA
 sfancsali@carnegielearning.com                                                                      vrus@memphis.edu
   sritter@carnegielearning.com                                                                   chipmorrison@gmail.com
 sberman@carnegielearning.com


ABSTRACT                                                                help affordances of an ITS). Such insights may also guide
We present details of a large, novel dataset that includes both         technical design and feature additions for ITSs like the Cognitive
Carnegie Learning Cognitive Tutor interaction data and various          Tutor (e.g., an online recommendation system based on cognitive
data related to learner interactions with human tutors via an online    and non-cognitive factors that drives a student to the right kind of
chat while using the Cognitive Tutor. We discuss integrating these      help, whether human or automated).
two data modalities within LearnSphere and propose workflows            The multi-modal data we consider in IHATS provide traditional
(and corresponding analyses) germane to our U.S. Department of          CT ITS learner interaction data of the sort analyzed by a wide
Defense Advanced Distributed Learning Initiative-funded                 variety of educational data mining studies and stored in the
Integrating Human and Automated Tutoring Systems (IHATS)                Pittsburgh Science of Learning Center’s LearnLab DataShop
project, demonstrating various aspects of the potential of the          format [9]. In addition to process data about student interactions
LearnSphere framework.                                                  with the CT, we have both chat transcripts for student interactions
                                                                        with human tutors as well as labels that describe various aspects
Keywords                                                                of these tutoring sessions, which we will now explain.
Intelligent tutoring systems, Cognitive Tutor, DataShop,                Our approach to processing tutoring chat transcript data for
LearnSphere, multi-modal data, human tutoring, mathematics              analysis has been to use human annotators to provide labels, over
education                                                               a sample of 500 transcripts of human tutoring chat sessions, for
                                                                        specific dialogue acts [13] and dialogue modes that occur within
1. INTRODUCTION                                                         these sessions as well as overall assessments of the “educational
The Integrating Human and Automated Tutoring Systems                    soundness” and learning effectiveness of these sessions. Machine
(IHATS) project is a U.S. Department of Defense Advanced                learning classifiers are trained on these human-annotated labels,
Distributed Learning Initiative-funded venture led by Carnegie          and these models are then applied to the large sample of
Learning, Inc., in partnership with researchers from the University     approximately 19,000 transcripts to provide meta-data about each
of Memphis and Carnegie Mellon University. IHATS leverages a            human tutoring chat session.
unique dataset collected over the period of June to December            LearnSphere provides a novel and innovative platform for storage
2014 from adult learners in two higher education (developmental)        and analysis of these multi-modal educational data. The present
algebra courses that featured Carnegie Learning’s Cognitive Tutor       paper describes our initial approach to representing these multi-
(CT) [11] intelligent tutoring system (ITS) for mathematics and
                                                                        modal data in LearnSphere, details several of the analyses we
provided students with the ability to seek human tutoring via an        intend to pursue, and proposes LearnSphere workflows that will
online chat mechanism.                                                  enable such analyses as well as future analyses.
The goal of the IHATS project is to provide insights into what
cognitive factors (e.g., error rates) and non/meta-cognitive factors    2. WORKFLOW METHOD
(e.g., behavior like “gaming the system” [1-3] and affective states
like boredom [5]) are likely to drive students to seek out human        2.1 Data Inputs
tutoring via the available chat mechanism and what predicts if that     Cognitive Tutor usage data are comprised of approximately 88
tutoring is likely to be successful in driving improved learning        million actions (i.e., DataShop transactions) from nearly 5,000
outcomes. Insight into predictors and possible determinants of          learners using CT in target courses. These data are represented in
human tutoring use as well as effectiveness of such use will            the PSLC DataShop MySQL format. Additionally, we have (and
provide a foundation for studying “hand-offs” between automated         will have) various annotations (i.e., labels) for each of the human
and human tutoring and instructional modalities. Insights into          tutoring chat sessions in which learners participated that will be
instructional and tutoring hand-offs may have a variety of              provided as custom fields in the cf_tx_level_big table in the
practical implications. For example, such insights can inform           DataShop MySQL format. Annotations and tags for each tutoring
classroom/instructional best practices (e.g., teachers having an        chat session will be associated with entries (i.e., the
ability to guide students as to when it is best for them to ask their
fellow students or the teacher for help versus working with the
transaction_id) in the cf_tx_level_big for the CT transaction that      requests before a learner’s first interaction with a human tutor or
immediately precedes the beginning of the chat session.1                on the problem the learner was working as she initialized a human
                                                                        tutoring chat session). Features used by existing detectors capture
Various columns of the tutor_transaction table and several related      facets of fine-grained, transaction-level data that include (but are
tables would be used in the workflows we propose, including, but        not limited to): speed with which actions are taken after making at
not limited to, outcome, duration, subgoal_id, and problem_id, as
                                                                        least one error on a problem-solving step in the CT (to detect
well as columns from tables including session, subgoal, and skill       gaming the system [1-3]), the maximum number of incorrect
that provide information about the particular knowledge                 actions or hint requests for any particular skill within a
components (KCs) or skills to which student actions/transactions        “clip”/period of problem-solving time (to detect boredom [5]),
are mapped.                                                             and many others. Engineered features can then be provided as
                                                                        input to a variety of statistical and machine learning models,
2.2 Workflow Model                                                      including those that comprise existing detector models, but
LearnSphere provides an intuitive user interface and analytics          simpler options like linear regression may also be useful.
affordances that may assist in a variety of analyses in the IHATS
                                                                        To enable multi-modal analysis, feature engineering from text
project. The general goal of our workflow(s) is to produce models
                                                                        data like our chat session transcripts would likely be helpful for a
of meta-/non-cognitive factors for learners in our dataset,
                                                                        variety of possible analyses. The goal of the IHATS project will
including machine-learned models (i.e., “detectors”) of gaming
                                                                        be to combine results of machine learned models applied to text
the system behavior [1-3], off-task behavior [4], and affective
                                                                        data (i.e., to classify dialogue acts, sub-acts, and modes) to
states like boredom, frustration, and engaged concentration [5].
                                                                        determine characteristics of the human tutoring interaction that are
Such detector models have proven fruitful in illuminating various
                                                                        associated with improved performance in the CT. Other possibly
associations and possible causal relationships among meta-/non-
                                                                        important, text-based features could also be extracted (e.g.,
cognitive factors as well as between such factors and learning, for
                                                                        content-related features such as how many content words or
example, between gaming the system behavior and final course
                                                                        domain/topic specific words the student or tutor generated as in
exam scores, in a similar population of adult learners using CT in
                                                                        [12]). Affordances within LearnSphere workflows could be
similar algebra courses [8]. These results make us hopeful that
                                                                        designed to allow for such analyses using features engineered
such meta-/non-cognitive factors, in addition to cognitive factors
                                                                        from both CT usage data and text modalities.
like hint use, might help predict whether students turn to human
tutoring from the CT and whether/if human tutoring tends to be
especially successful under certain conditions.                         2.3 Workflow Outputs
                                                                        A variety of possible outputs can result from our proposed
One possible, omnibus workflow would begin with raw DataShop            workflow(s). BKT parameters for KCs estimated from our large
format data and output the final results of aforementioned detector     dataset of approximately 88 million learner actions may be of use
models (built for a specific product like CT Algebra), including        to EDM and learning analytics researchers, and those models can
transaction-level predictions of whether particular actions are         be evaluated by a variety of metrics available within
likely to be instances of gaming the system or off-task behavior or     DataShop/LearnSphere. Transformed datasets with columns
whether learners are likely to be experiencing specific affective       corresponding to features engineered within the workflow (and
states in particular “clips” (i.e., time intervals) of CT usage.        rows corresponding to appropriate units of analysis/aggregation)
However, such a workflow can (should) be broken down into its
                                                                        can be exported for use in statistical and other software tools.
(more general) constituent components so that elements of the           Specific tools for the EDM community, including detector models
workflow can be generalized to other products, environments, and        of behavior and affect (based on engineered features and models
settings, including other ITSs and educational technologies. For        parameterized/estimated for particular products and systems),
example, Bayesian Knowledge Tracing (BKT) [7] parameter                 could also be included as affordances to workflows. Assuming
estimates for knowledge components in our dataset are required          general analytical tools are incorporated into LearnSphere,
inputs to later steps in “building” detector models, and                statistical models about relationships among such learner behavior
LearnSphere already supports a workflow to learn BKT parameter          and affect (i.e., the output of detectors) could also be the output of
estimates from data.
                                                                        a workflow (i.e., using the results of particular models as input to
Detector models for CT also rely on features engineered from            other models in a “discovery with models” approach [6, 8]). For
fine-grained transaction level data. One component of the               the IHATS project, we are likely to pursue modeling within the
LearnSphere suite of tools should include the ability to easily         framework of algorithmic search for graphical causal models [14]
generate such features and enable researchers to easily specify         to uncover possible causal relationships among particular
new features to be engineered/specified from fine-grained data.         behaviors, affective states, learner interactions with human
Ideally, features could be engineering to range over a variety of       tutoring, and learning. This approach has been fruitfully applied in
levels of aggregation and units of analysis, including at the           analyses of CT data from a similar population [8] as well as in
transaction, problem, session, and student level (e.g., enabling        other EDM and learning analytics studies (e.g., [10], among
calculation of average transaction, problem, or session time as         others). Such search techniques could, in principle, be made
well as total student time) and over different time spans (e.g., hint   available within LearnSphere workflows (among a bevy of other
                                                                        statistical and machine learning techniques) to provide a “one-
                                                                        stop-(Data)Shop” for applying such analyses to learner data.
1
    While contractual restrictions forbid us from releasing raw
    tutoring chat session transcripts, the distributed nature of
    LearnSphere will allow us to locally (and thus privately)
                                                                        3. DISCUSSION
    integrate even the chat transcripts, rather than just               LearnSphere provides an exciting opportunity to make a variety of
    annotations/tags associated with these transcripts, within a        general interest workflows available to the broader EDM
    single DataShop style database.                                     community of researchers. As we have outlined above,
investigators’ choice(s) of workflows (and components thereof)       [5] Baker, R.S.J.d., Gowda, S.M., Wixon, M., Kalka, J., Wagner,
are likely to depend on their scientific interests and goals. Rich       A.Z., Salvi, A., Aleven, V., Kusbit, G.W., Ocumpaugh, J.,
learner data from environments like ITSs, educational games, and         Rossi, L. 2012. Towards sensor-free affect detection in
MOOCs can be readily used to make progress on a variety of               Cognitive Tutor Algebra. In Proceedings of the 5th
questions about cognitive modeling and data-driven                       International Conference on Educational Data Mining
improvements to student/KC models as well as to questions about          (Chania, Greece, June 19-21, 2012). International
relationships between cognitive, non-cognitive, and meta-                Educational Data Mining Society, 126-133.
cognitive factors at play as learners interact with such systems.    [6] Baker, R.S.J.d., Yacef, K. 2009. The state of educational data
LearnSphere pushes the boundaries of the types of multi-modal            mining in 2009: a review and future visions. Journal of
data with which researchers will be able to more easily work,            Educational Data Mining 1 (2009), 3-17.
including human tutoring chat transcripts (and corresponding
meta-data) in the IHATS project, to pursue innovative research in    [7] Corbett, A.T., Anderson, J.R. 1995. Knowledge tracing:
the learning sciences. LearnSphere’s making analyses of such             Modeling the acquisition of procedural knowledge. User
learner data more readily generalizable and replicable will be a         Model. User-Adap. 4 (1995), 253-278.
great service to the learning sciences and EDM community.            [8] Fancsali, S.E. 2014. Causal Discovery with Models:
                                                                         Behavior, Affect, and Learning in Cognitive Tutor Algebra.
4. ACKNOWLEDGMENTS                                                       In Proceedings of the 7th International Conference on
The IHATS project is funded by the U.S. Department of Defense,           Educational Data Mining (London, UK, July 4-7, 2014)
Advanced Distributed Learning Initiative, Contract W911QY-15-            International Educational Data Mining Society, 28-35.
C-0070.                                                              [9] Koedinger, K.R., Baker, R.S.J.d., Cunningham, K.,
                                                                         Skogsholm, A., Leber, B., Stamper, J. 2011. A data
5. REFERENCES                                                            repository for the EDM community: the PSLC DataShop. In
                                                                         Handbook of Educational Data Mining, C. Romero, S.
[1] Baker, R.S.J.d., Corbett, A.T., Koedinger, K.R., Evenson, E.,
                                                                         Ventura, M. Pechenizkiy, & R.S.J.d. Baker, Eds. CRC, Boca
    Roll, I., Wagner, A.Z., Naim, M., Raspat, J., Baker, D.J.,
                                                                         Raton, FL.
    Beck, J. 2006. Adapting to when students game an intelligent
    tutoring system. In Proceedings of the 8th International         [10] Koedinger, K.R., Kim, J., Jia, J.Z., McLaughlin, E.A., Bier,
    Conference on Intelligent Tutoring Systems (Jhongli, Taiwan,          N.L. 2015. Learning is not a spectator sport: Doing is better
    2006). 392-401.                                                       than watching for learning from a MOOC. In Proceedings of
                                                                          the Second (2015) ACM Conference on Learning@Scale
[2] Baker, R.S., Corbett, A.T., Koedinger, K.R., Wagner, A.Z.
                                                                          (Vancouver, Canada, March 14-15, 2015). ACM, New York,
    2004. Off-task behavior in the Cognitive Tutor classroom:
                                                                          111-120.
    when students "game the system." In Proceedings of ACM
    CHI 2004: Computer-Human Interaction (Vienna, Austria,           [11] Ritter, S., Anderson, J.R., Koedinger, K.R., Corbett, A.T.
    2004). 383-390.                                                       2007. Cognitive Tutor: applied research in mathematics
                                                                          education. Psychon. B. Rev. 14 (2007), 249-255.
[3] Baker, R.S.J.d., Corbett, A.T., Roll, I., Koedinger, K.R.
    2008. Developing a generalizable detector of when students       [12] Rus, V., Stefanescu, D. 2016. Non-intrusive assessment of
    game the system. User Model. User-Adap. 18 (2008), 287-               learners’ prior knowledge in dialogue-based intelligent
    314.                                                                  tutoring systems. International Journal of Smart Learning
                                                                          Environments 3 (2016), 1-18.
[4] Baker, R.S.J.d. 2007. Modeling and understanding students’
    off-task behavior in intelligent tutoring systems. In            [13] Searle, J.R. 1969. Speech Acts: An Essay in the Philosophy of
    Proceedings of ACM CHI 2007: Computer-Human                           Language. Cambridge: Cambridge UP.
    Interaction (San Jose, CA, April 28 – May 3, 2007). ACM,         [14] Spirtes, P., Glymour, C., Scheines, R. 2000. Causation,
    New York, 1059-1068.                                                  Prediction, and Search. 2nd Edition. MIT, Cambridge, MA.