=Paper=
{{Paper
|id=Vol-2501/paper7
|storemode=property
|title=Towards Dynamic Intelligent Support for Collaborative Problem Solving
|pdfUrl=https://ceur-ws.org/Vol-2501/paper7.pdf
|volume=Vol-2501
|authors=Sidney D’Mello,Angela E.B. Stewart,Mary Jean Amon,Chen Sun,Nicholas Duran,Valerie Shute
|dblpUrl=https://dblp.org/rec/conf/aied/DMelloSASDS19
}}
==Towards Dynamic Intelligent Support for Collaborative Problem Solving==
59 Towards Dynamic Intelligent Support for Collaborative Problem Solving Sidney D’Mello1, Angela E. B. Stewart1, Mary Jean Amon1, Chen Sun2, Nicholas Duran2, & Valerie Shute3 1 University of Colorado Boulder, Boulder CO 80309, USA 2 Arizona State University, Glendale AZ 85306 3 Florida State University, Tallahassee FL 32306 3 sidney.dmello@colorado.edu Abstract. We discuss progress towards the design of collaborative interfaces that automatically assess key facets of collaborative problem solving (CPS) and in- tervene accordingly. Our work is grounded in a generalized theoretical model of CPS including three major facets (constructing shared knowledge, negotia- tion/coordination, and maintaining team function), subfacets, and verbal and non- verbal indicators. We report results of two studies that validated the model fol- lowed by speech and language processing techniques to automate the assessment of the CPS facets. We conclude by discussing future plans on how to incorporate the models in next-generation CPS interfaces that support dynamical assessment and intelligent intervention. Keywords: Assessing Collaborative Problem Solving; Natural Language Pro- cessing; Machine Learning. 1 Introduction It is widely acknowledged that collaborative problem solving (CPS) is an essential 21st century skill in our increasingly connected and globalized world [1]. Yet, we know precious little about how to define, measure, and help develop this skill, especially in the context of STEM learning. By increasing our basic understanding of effective CPS processes, we can take a step towards designing next-generation collaborative learning environments that aim to make CPS more enjoyable and effective. Accomplishing this vision requires: (1) identification of effective CPS processes (or facets); (2) automati- cally monitoring the core CPS processes to enable intervention; and (3) designing and testing the efficacy of intelligent collaborative interfaces with dynamic intervention and/or after-action feedback and reflection. Here, we describe progress on the first two of these components and sketch out ideas for the third component. Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). 60 2 Collaborative Problem Solving Model and its Validation We synthesized previous research on CPS to construct a generalized CPS competency model (i.e., skills and abilities) from existing frameworks, such as ATC21S [2] and PISA [3] along with some classic work on CPS [4, 5]. Since our model is intended to be generalizable, we validated it using data from two very different studies. 2.1 CPS Model Our model consists of the following core facets: (1) constructing shared knowledge (expresses one’s own ideas and attempts to understand others’ ideas); (2) negotia- tion/coordination (achieves an agreed solution plan ready to execute); and (3) maintain- ing team function (sustains the team dynamics). Each facet has two sub-facets, which in turn, have multiple verbal and nonverbal indicators as shown in Table 1. 2.2 Model Validation We validated our competency model in two studies [6]. In Study 1, 11 triads of middle school students (8th-9th graders) played Physics Playground (PP) face-to-face for three hours. This is a 2D educational video game that was developed to support and measure students’ learning of conceptual physics [7]. It focuses on Newton's laws of force and motion, mass, gravity, potential and kinetic energy, and conservation of momentum. Problems (or levels) in PP require students to guide a green ball to a red balloon. The primary way students move the ball is by creating agents, simple machines of force and motion (i.e., ramps, levers, pendulums, and springboards), drawn with colored lines using the mouse, that “come to life” on the screen. For example, Figure 1 (ultimate pinball level) shows a sample problem where the student must draw a carefully con- structed ramp (in purple) to lead the falling ball along a path to the balloon. Students receive silver trophies for any solution to a problem but earn gold trophies for elegant solutions involving a limited number of objects created and used to solve the problem (the threshold varies but is typically < 3). Fig. 1. A level in Physics Playground. 61 Table 1. Generalized competency model composed of facets, sub-facets, and indicators. Facet (Sub-facet) Indicators Constructing shared knowledge: expresses ideas and attempts to understand others’ ideas Shares understanding of problems Talks about specific topics/concepts and ideas and solutions on problem solving Proposes specific solutions Talks about givens and constraints of a spe- cific task Builds on others’ ideas to improve solutions Establishes common ground Recognizes and verifies understanding of oth- ers’ ideas Confirms understanding by asking ques- tions/paraphrasing Repairs misunderstandings Interrupts or talks over others as intrusion (R) Negotiation/Coordination: achieves an agreed solution plan ready to execute Responds to others’ questions/ideas Does not respond when spoken to by others (R) Makes fun of, criticizes, or is rude to others (R) Provides reasons to support/refute a potential solution Makes an attempt after discussion Monitors execution Talks about results Brings up giving up the challenge (R) Maintaining team function: sustains the team dynamics Fulfills individual roles on the team Not visibly focused on tasks and assigned roles (R) Initiates off-topic conversation (R) Joins off-topic conversation (R) Takes initiatives to advance collabo- ration processes Asks if others have suggestions Asks to take action before anyone on the team asks for help Compliments or encourages others Note. “R” next to an indicator means that it is reverse coded. Below is an excerpt of an exchange between two participants (Player A and Player C) during gameplay, along with tags for the relevant indicators. 62 Player C: “What if you grabbed it upwards. And then drew a pendulum, knock it out. But you drew like farther out, the pendulum” (Proposes specific solutions) Player A: “I have an idea. Wait, which direction should I swing?” (Confirms under- standing by asking questions/paraphrasing) Player C: “Swing from here to here.” (Proposes specific solutions) Player A: “Nope, then it would just fly to the spider.” (Provides reasons to sup- port/refute a potential solution) In Study 2, 37 undergraduate triads played Minecraft-themed Hour of Code for 20 minutes using videoconferencing. This is an online resource for students grades two and above to learn basic computer programming principles in an hour. It uses a visual programming language, Blockly (https://developers.google.com/blockly/), to interlock blocks of code (such as loops). Blockly eliminates syntax errors by only interlocking syntactically correct blocks, allowing students to focus on the coding logic and pro- gramming principles (see Figure 2). Fig. 2. Minecraft-themed Hour of Code. Students can watch their code run (A), choose from a code bank of possible blocks (B), generate code (C) and see their teammates (D). Below is an excerpt of an actual exchange between all three participants (Players A, B, and C) during gameplay, along with the relevant indicators. Player C: “Yeah I think so. Cuz we’ll fall in, right?” (Provides reasons to support/re- fute a potential solution) Player A: “Yeah that’s true. Then we wanna place bedrock ahead. Oh, but don’t we want to repeat that? One, two, three…” (Proposes specific solutions + Asks if others have suggestions) Player B: “And we have to move forward” (Proposes specific solutions) 63 2.3 Summary of Results In Study 1, we coded the entire three-hour gameplay data based on the CPS model shown in Table 1. In Study 2, we randomly selected a 90-second segment out of each five-minute period for the 20-minute videos. Factor analyses indicated reasonable fit to our theorized model. Correlational analyses provided evidence on the orthogonality of the facets and their independence to individual differences in prior knowledge, intelli- gence and personality. Regression analyses indicated that the facets predicted both sub- jective and objective outcome measures controlling for several covariates. Overall, the results support the validity of our CPS model (see [6] for full details). 3 Automated CPS Modeling from Spoken Language The next step was to automatically model the aforementioned CPS facets. Given the prevalence of verbal communication, we sought to model the data from speech ana- lyzed at the utterance level. Accordingly, we used IBM Watson Speech to Text service to transcribe participants’ audio using data from Study 2. Three research assistants were trained to code the resultant 11,163 utterances for evidence of the indicators from our CPS competency model. We aggregated the indicators to obtain binary codes for the presence or absence of each of the three core CPS facets per utterance. 3.1 Modeling Approach We used Random Forest classifiers trained on the frequency counts of words and two- word phrases (bag of n-grams). Additionally, we investigated an alternate word coding method so that features would theoretically generalize to other domains. For this, we used the Linguistic Inquiry Word Count (LIWC) to count the proportion of words in an utterance that belong to each of 73 pre-defined categories (e.g., positive affect). Any non-zero LIWC categories (i.e. that category was present in the utterance) were added as uni-grams. We used team-level ten-fold nested-cross validation where all the utterances for a given team were in the training set or testing set, but never both, which is important for team-level generalizability. Within each testing fold, the training set was again split into five folds, one of which was a validation fold for hyperparameter tuning. For each validation fold, a model was fit and scored using every combination of hyperparameters via a grid search. The accuracy scores for each parameter combination across the five validation folds were averaged, and the hyperparameters which resulted in the highest average accuracy were preserved. A model was then fit on the full training set using these best hyperparameters, and predictions were made on the test fold. These predic- tions were pooled over the ten test folds before final accuracy metrics were computed. We tuned four hyperparameters using this method: 1) whether to include unigrams or bigrams (n-grams only, not applicable for LIWC categories); 2) whether to use a pointwise mutual information (to filter phrases [8]) of 2 or 4 (bigrams only); 3) mini- mum document frequency of n-grams (0%, 1%, or 2%), and 4) training set balancing 64 method (random undersampling, random oversampling, and synthetic minority over- sampling technique). Class distributions for the validation and testing sets were left unchanged. 3.2 Summary of Results Despite imperfect automatic speech recognition (word error rates of 45%), the n-gram models achieved AUROC (area under the receiver operating characteristic curve) scores of .85, .77, and .77 for construction of shared knowledge, negotiation/coordina- tion, and maintaining team function, respectively (70%, 54%, and 54% improvement over chance). The LIWC-category models achieved similar scores of .82, .74, and .73 (64%, 48%, and 46% improvement over chance). Next, we used linear mixed effects models to investigate the relationship between the three CPS facets and the following CPS outcome variables assessed at the individual level: posttest score, subjective perception of the team’s performance, and subjective perception of the collaboration process (see [9]). To examine whether the human-coded and model-predicted scores yielded similar effects, we constructed separate models for each, resulting in 27 models (3 facets × 3 outcome variables × 3 sources [human vs. n- gram vs LIWC-category]). We averaged the expert-coded utterance scores and the model-predicted utterance-level probabilities for each participant for inclusion as pre- dictors. We included each individual’s total words spoken, ACT score, whether the individual knew his/her teammates, and whether the individual was assigned to interact with the environment as control variables (covariates). Team identity was included as a random factor (intercept only) to account for nesting of individuals within teams. We found that n-gram and LIWC model-derived facet scores yielded similar coeffi- cients to human-coded scores. Specifically, both model-derived scores of construction of shared knowledge positively predicted posttest scores (b = .09, p < . 05 for n-grams) and b = .08, p < . 10 for LIWC), which was similar to the human codes (b = .11, p < . 10). 4 Closing-the Loop – Providing Feedback on CPS processes We plan to embed the validated models into the collaborative environment to monitor and provide feedback on the unfolding CPS processes. For example, if maintaining team function is high but shared knowledge construction is low because one member is consistently dominating, then the system might display the following message: “You all seem to be getting along great! But make sure that everyone on the team gets a chance to contribute solution ideas.” Alternatively, if team members are all generally contributing to the problem-solving efforts, but there are some issues with communica- tion, specifically active listening since some members interrupt or talk over others, then the message could be: “Everyone is contributing great solution ideas. Please make sure to listen to each other first before talking.” The precise intervention strategies, when to intervene (real-time or as a mid- or after- task review), how frequently to intervene, how to render the interventions, and the level 65 of intervention (team level, individual level, or both) awaits design, testing, and refine- ment. Once a prototype is developed, we will conduct a controlled experiment to eval- uate the efficacy of automated CPS feedback to an appropriate control condition. Our prediction is that the feedback-enabled system will yield to enhanced CPS outcomes, an exciting possibility that ushers forth a new generation of CPS environments that support real-time assessment and intelligent intervention. 5 Acknowledgments This research was supported by the National Science Foundation (NSF DUE 1745442) and the Institute of Educational Sciences (IES R305A170432). Any opinions, findings and conclusions or recommendations expressed in this material are those of the author and do not necessarily reflect the views of the funding agencies. 6 References 1. Griffin, P., B. McGaw, Care, E. Assessment and teaching of 21st century skills., New York: Springer. (2012). 2. Care, E., C. Scoular, Griffin, P. Assessment of collaborative problem solving in education environments. Applied Measurement in Education, 29(4), pp. 250-264. (2016). 3. Organisation for Economic Co-operation and Development (OECD), PISA 2015 Collaborative Problem Solving Framework. (2015). 4. Roschelle, J.,Teasley, S.D. The construction of shared knowledge in collaborative problem solving, in Computer supported collaborative learning, C.O.M. (Ed.), Editor. Springer: Berlin. pp. 69-97. (1995). 5. Nelson, L.M. Collaborative problem solving, in Instructional design theories and models: A new paradigm of instructional theory, C.M. Reigeluth, Editor. Routledge: New York, NY. pp. 241-267. (1999). 6. Sun, C., V. Shute, A. Stewart, J. Yonehiro, N. Duran, D'Mello, S.K. Toward a Generalized Competency Model of Collaborative Problem Solving. (in review). 7. Ploetzner, R.,VanLehn, K. The acquisition of qualitative physics knowledge during textbook-based physics training. Cognition and Instruction, 15(2). pp. 169-205. (1997). 8. Park, G., H.A. Schwartz, J.C. Eichstaedt, M.L. Kern, M. Kosinski, D.J. Stillwell, L.H. Ungar, Seligman, M.E. Automatic personality assessment through social media language. Journal of Personality and Social Psychology, 108(6): pp. 934-952. (2015). 9. Stewart, A., D’Mello, S.K. Connecting the Dots Towards Collaborative AIED: Linking Group Makeup to Process to Learning, in Proceedings of the 19th International Conference on Artificial Intelligence in Education (AIED'18). Springer. pp. 545-556. (2018).