Analysis of Human-to-Human Tutorial Dialogues: Insights for Teaching Analytics Irene-Angelica Chounta, Bruce M. McLaren Patricia Albacete, Pamela Jordan, Sandra Katz Human-Computer Interaction Institute Learning Research and Development Center Carnegie Mellon University University of Pittsburgh Pittsburgh PA, 15213, USA Pittsburgh PA, 15260, USA {ichounta,bmclaren}@cs.cmu.edu {palbacet, pjordan, katz}@pitt.edu ABSTRACT  How can we define the “level of support” that a successful In this paper we present a preliminary analysis of human-to- tutor gives during tutoring? human tutorial dialogues as a precursor to developing an adaptive  What makes some help given by a tutor more generous or tutorial dialogue system, guided by a student model. One of our stingy, easy or challenging, straightforward or main goals is to further understand what makes tutorial dialogue “cognitively complex” for students? successful, in particular how tutorial dialogues adapt to different student characteristics and prior knowledge and how to provide Several researchers have addressed these questions, in various feedback to students in order to further support their practice. In domains, for various purposes—for example, to develop particular we aim to identify important factors that affect tutorial instructional materials for classroom and computer-based learning dialogues and to characterize the level of support provided to environments; to address questions about how scaffolding and its students with different levels of understanding. Our approach and counterpart, “contingent tutoring”, take place in naturalistic findings could also inform teaching and teaching analytics. learning settings in order to guide teacher training; to measure the effectiveness of curricula that implement scaffolding as a key CCS Concepts feature. It is not surprising that the diverse set of goals driving the • Applied computing ➝ Education. quest for ways to operationalize “levels of support” would produce an equally diverse set of descriptive frameworks. Keywords The aim of this paper is thus threefold: Tutorial dialogues; level of support; student modeling; adaptive 1. To explore how teachers regulate discussions and adapt dialogue; teaching; learning their levels of support during tutorial dialogues, 1. INTRODUCTION 2. To identify the factors that define “level of support” (LOS) The learning process and its outcome highly depend on the social in human-to-human tutoring examples, and interaction between teachers and students and, in particular, on the 3. To propose analytics and mechanisms to guide tutors in proficient, helpful and focused use of language that students are orchestrating effective and efficient interventions in adaptive exposed to by their teachers through written texts or tutoring tutorial dialogues. discussions [20]. The findings from this research gave rise to several methodologies that promote dialogue as a means of In the following sections, we present the process of identifying keeping students engaged and motivated (such as instructional important factors and constructing a coding scheme to conversation [23]). This shift to socially-oriented methods was characterize the level of support in tutorial dialogues. However, also observed for technology-enhanced learning contexts and for our aim is not to use this scheme to analyze tutorial dialogues, but intelligent tutoring systems [26]. to provide guidance to dialogue authors for tailoring the level of support to provide to students who exhibit different levels of Our research goal is to develop an adaptive tutorial dialogue understanding. system, guided by a student model. In this paper we present a preliminary analysis of human-to-human tutorial dialogues as a 2. RELATED WORK precursor to achieving our main objective. Our approach and Several researchers who have examined the question, “Why is findings could also be informative to teaching and amenable to human tutoring so effective?”, have proposed that this effect is teaching analytics. due to the highly interactive nature of human tutoring—in other A key focus of our project is to further understand what makes words, the degree to which the student and tutor react to and build tutorial dialogue successful; in particular, how tutorial dialogues upon each other’s dialogue moves and perceived understanding. adapt to different student characteristics and prior knowledge and This has been called the Interaction Hypothesis (e.g., [4,11,22]). how to provide feedback to students in order to further support However, an important line of research carried out over the past their practice. According to Vygotsky, tutors use their assessment decade to test this hypothesis has found that it is not how much of students’ ability to adapt the level of discussion to the student’s interaction takes place during tutoring that’s important, nor the “zone of proximal development” (ZPD)—that is, a little bit granularity of the interaction—for example, whether the student beyond the student’s current level of understanding about a and tutor discuss a step towards solving a problem or the sub- concept, ability to perform a skill, etc. [25]. In particular, we are steps that lead to that step. Rather, what matters is how well the interested in the following questions: interaction is carried out—for example, what content is addressed and how it is addressed, in a particular dialogue context (e.g., completion,” “Feature Specification,” “Quantification”) and the [5,6]). question that required long, elaborated responses (“Definition,” In order to study and analyze the dynamics of dialogue, either in “Example,” “Comparison”). The remaining categories were: the classroom or in one-on-one settings, researchers have “Interpretation,” “Causal antecedent,” “Goal orientation,” attempted to identify distinctive features of instructional dialogue “Instrumental/procedural,” “Enablement,” “Judgmental,” and to define schemes for characterizing “level of support”. “Assertion,” “Request/directive”. Reasoning and deeper Although most of the existing coding schemes were developed for understanding are usually exposed with questions that ask “how” problem-solving or other task-oriented domains, they may also or “why” and invite for long, well-elaborated answers [10]. prove relevant for operationalizing “level of support” for Scaffolding is a dynamic process. The tutor might change levels conceptually-oriented, reflective dialogues. The diverse set of of support from one turn to the next and in accordance with the goals driving the quest for ways to operationalize “levels of student’s response. The main factor used to characterize the support” has produced an equally diverse set of descriptive student’s response to support is correctness: was the student’s frameworks. However, these schemes can be grouped according response to the tutor’s question/hint correct, partially correct, or to the underlying, typically tacit dimensions that their developers incorrect? Other factors that might influence change of the level used to differentiate the “levels of support” included in their of support have also been suggested, such as the difficulty level of coding schemes. The most common dimensions are the degree of the subject matter, time available and teachers’ global perception detail (or specificity) in the tutor’s help and the level of “cognitive of the student’s ability (e.g. [7]) complexity” in the tutor’s comment, question, or directive. For Human tutors are obviously unable to carry out detailed and example, Van de Pol’s approach to measuring scaffolding highly accurate diagnoses of student knowledge [21]; their involves characterizing the teacher’s “level of control” [19]. The assessments of students’ knowledge deficits are often inaccurate main dimensions in this scheme are the degree of “openness” or [3]. However, they nonetheless construct and dynamically update detail in the requested response, the length of the requested a normative mental representation of students’ grasp of the response and the amount of new content that the teacher domain content under discussion, as reflected in tutors’ adaptive introduced during her turn. Van de Pol proposed the measurement responses to students’ need for scaffolding or remediation ([13]). of “degree of teacher control” (TDc) on a six-step scale, starting For example, if a student solves a problem quickly and accurately, from No Control (TDc0 – when the teacher was not with the the tutor will probably challenge the student with some questions students) to Highest degree of control (TDc5 – when the teacher that go beyond the current problem’s level of difficulty. On the provided new content, elicited no response and was providing the other hand, if a student is struggling, the tutor will go slowly, students with an explanation of the answer to a particular perhaps clarifying step by step the knowledge the student seems to question). be lacking. As a tutoring session progresses, the tutor will Other schemes have focused on the distribution of cognitive effort dynamically update his or her conception of what the student between the tutor (also teacher or parent) and the student—in knows and does not know. This allows the tutor to select particular, who is doing the “heavy lifting” at particular points appropriate problems to solve (macro–adaptation)—perhaps during instruction. Pino-Pasternak et al. [18] were interested in simpler problems if the student has not done well or more determining if the level of parental mediation impacted students’ challenging ones if the student is performing proficiently. The self-regulated learning (SRL) —that is, students’ ability to control tutor’s dialogue with the student also enables the tutor to focus on and monitor their own learning processes. They found that particular curriculum elements (facts, concepts, skills, etc.) to contingent shifting between mediation levels supported children’s discuss during a given problem and to determine the appropriate SRL. This scheme introduces the dimensions of cognitive level at which to discuss these elements (micro-adaption). demand (i.e. the distribution of cognitive effort between the parent However, dynamically adjusting the level of support according to and child), the student’s level of understanding and the students’ changing understanding of domain concepts is not a operationalization of “contingent tutoring” in terms of a mediation trivial task – for humans or intelligent tutors. level that shifts to meet students’ level of understanding. A similar approach was proposed by Nathan and Kim [16], who studied the 3. METHODOLOGY way teachers regulate elicitations with respect to a cognitive 3.1 Core Rationale hierarchy and in accordance with the correctness of students’ To better understand the mechanics of tutorial discussions, we responses. Toward that end, they coded teachers elicitations using studied a corpus of human-to-human tutorial dialogues. In Mehan’s coding scheme [15]. particular, we took two approaches to better understand how In addition to the aforementioned schemes, Nystrand et.al [17] tutors vary the “level of support” that they provide to students. developed an approach to analyzing classroom discourse, focused The first was a fairly extensive literature review of coding on identifying factors that increase (or suppress) students’ schemes for tutoring discussions [7,10,11,16–19]. The second was question-asking and other types of interactions that make up rich to test the dimensions of level of support that other researchers discussions, or “dialogic spells.” Their approach includes a have identified, by coding a corpus of tutorial dialogues. This was taxonomy that can be used to describe the cognitive complexity an iterative process of reviewing, coding and evaluating the (which they call “cognitive level”) of teachers’ questions based on results that resulted in the creation of a coding scheme for the the level of abstraction and the status of information that the level of support in tutorial dialogues. question invokes (i.e. new vs. old information). 3.2 Coding scheme Graesser et al. [10] focused on classifying the questions asked in In order to test the accuracy and coverage of the factors that tutorial dialogues. They defined 18 question categories based on define “level of support” that we identified through our literature their content. Furthermore, 8 of the aforementioned categories review, as summarized in the previous section, we coded tutors’ were further clustered into two subgroups: the questions that turns in human tutor-student dialogues with respect to these required a short answer (“Verification,” “Disjunctive,” “Concept factors (dimensions). In particular, we defined four dimensions: - Level of Control: we have adapted the dimension of “Degree of Teacher: The general rule for finding acceleration is F= Teacher Control” as described in [19] and coded it using a three- a*m. and this is known as Newton’s Second Law of step scale (Low, Medium, High) where: Low control signifies that motion. But here your answer is not correct. Keep in the teacher provided no new content and/or asked an open-ended mind that Newton’s second law of motion can be question, expecting a well-elaborated answer; Medium control formally stated as: “The acceleration of an object as signifies provision of new content that is not directly related to the produced by a net force is directly proportional to the question or seeks a short answer; High control signifies that the magnitude of the net force, in the same direction as the teacher provides new content or provides a hint or elicits no net force, and inversely proportional to the mass of the response and instead provides an explanation. Table 1 presents object.” So…? three examples where the teacher provided feedback at different levels of control based on her perception with respect to the Student: …. student’s level of understanding of Newton’s Second Law. In this example, the student provides a wrong answer on how to - Question Category: we have used the question categories as compute acceleration and the teacher replies with feedback described in the coding scheme of Graesser et. al. [10]. In regarding the general context. However the teacher does not particular, we used 18 categories to code the teacher interventions, provide feedback on why the student’s answer is wrong or what in for example: verification, disjunctive, concept completion, particular should be corrected. verification etc. (presented in Related Work). Some examples on Table 2. Three examples of Low, Medium and High tutorial how this dimension was applied to the corpus are presented later feedback regarding the Level of Specificity on (see Table 7). Table 1. Three examples of Low, Medium and High tutorial Level of Support feedback regarding the Level of Control Low Medium High Level of Control Student S:F = ma Low Medium High Teacher what's f there? Teacher So what is the net force on her? Student mg Student 39N No.. the F in F=ma is always the net force on the object Which way? We have 500 N just mg ? how (or group of objects). The from the rope pulling up Teacher just mg? many forces act vector sum of all the forces Which and 539N from her weight on the climber ? on the object. I prefer to say Teacher direction? direction? Up (the gravitational force "Sum of F= ma" because it's or Down? from the earth) pulling easier to get it right. down. So what is the direction of the net force? Examples of the three levels of specificity are shown in Table 2. - Level of Specificity: this dimension refers to whether the tutor - Contingency: to code contingency in the tutor’s turn, we adapted provided specific and focused information to the student. It was the coding scheme of Pino-Pasternak et. al. [18]. According to this coded using a 3-step scale as: Low specificity signifies that the scheme, contingent tutoring takes place when the tutor challenges tutor does not provide detail or specific information to the student; the student with questions or comments that are at or above her Medium specificity signifies that the tutor provides some specific potential and non-contingent tutoring happens when the tutor information related to the student’s input but not enough to lead poses questions and tasks that are lower than the student’s her to the answer; High specificity signifies that the teacher potential. This dimension was coded on a binary scale (i.e. provides detailed feedback to the student, directly related to the contingent vs. non-contingent). Examples of contingent and non- issue in question. Specificity is an important factor of instructional contingent tutorial feedback are presented in Table 3. dialogue and is usually perceived as an attribute of the Table 3. Examples of continent and non-contingent teacher’s information content that the teacher provides to the student (for input for the same Student-Teacher dialogue example, Van de Pol differentiates between broad, open questions and detailed ones [19]). Contingent Non‐Contingent In our case, this dimension refers to the specificity of the teacher’s Teacher what is the acceleration then? feedback in relation to the student input that precedes the tutor’s turn and in this sense, it is different than the Level of Control. Student Fnet=ma, so a = 39/55 This means that a tutor turn could be coded as medium or high for Teacher Can you provide the units? Ok, a = 39 N / 55 kg Level of control and low for Level of Specificity. For example, let us consider the following dialogue: Teacher: What minimum acceleration must the climber 3.3 Dataset have in order for the rope not to break while she is We applied our coding scheme to part of a dialogue corpus that stems from previous research to assess the effectiveness of rappelling down the cliff? human-guided reflective discussions about physics problems [12]. Student: the acceleration equals the rope tension In particular, the study involved sstudents who were taking an divided by the climber’s weight introductory physics course at the University of Pittsburgh; these students were randomly assigned to three conditions: one in which students received reflection questions and interacted with a human tutor via a chat interface; a second reflection condition in which Teacher: No.. the F in F=ma is always the net force on the object students were asked the same set of reflection questions but (or group of objects). The vector sum of all the forces on the received a static text explanation as feedback after they responded object. I prefer to say "Sum of F= ma" because it's easier to get it to these questions; and a third, a control condition in which right. So.. if she is sliding down and the rope is just short of students were not asked reflection questions but solved more problems than students in the other two conditions, in order to breaking, what is the *net* force on her? control for time on task. From this corpus, we chose three human- Student: 0 to-human tutorial dialogues on Newton’s Second Law (i.e. “The Teacher: hmm hmm that was what it was in the problem above. acceleration of an object as produced by a net force is directly Now we are in the case where the rope breaks at >500N. What's proportional to the magnitude of the net force, in the same the tension in the rope just short of breaking? direction as the net force, and inversely proportional to the mass of the object”) from the first condition (students engaging in a Student: 500N typed dialogue with his or her tutor via a simple chat interface, Teacher: Right. that's pulling her which way? about each reflection question) for further analysis. The three dialogues were chosen to represent students who displayed three levels of gain from pretest to posttest: one low, 4. Results In order to assess the reliability of agreement between the experts, one medium and one high. The problem and the reflection we computed the Fleiss’ Kappa for all four coding dimensions of question for all three dialogues were stated as: “Problem: A rock the scheme. Of course, these are only preliminary results and climber of mass 55 kg slips while scaling a vertical face. therefore the Fleiss’ Kappa should be considered no more than a Fortunately, her carabiner holds and she is left hanging at the glimpse at the effectiveness and accuracy of the coding scheme. bottom of her safety line. Suppose the maximum tension that the The results are displayed in Table 5. The inter-rater agreement can rope can support is 500 N. Reflection Question: What minimum be interpreted as fair for the dimensions of Level of control and acceleration must the climber have in order for the rope not to Question category and poor for the Level of specificity [2,14]. For break while she is rappelling down the cliff? (You do not have to the dimension of Contingency, the result is not statistically come up with a numerical answer. Just solve for "a" without any significant (p-value > 0.05). However, the inter-rater agreement substitution of numbers.)”. In effect, this question asked students results and the discussion that followed showed that the suggested to name the forces that act on the climber and to apply Newton’s coding scheme did not adequately capture the nuances of how Second Law in order to compute the acceleration. tutors dynamically adapt their responses to student input, during 3.4 Applying the coding scheme to our dataset human-to-human tutorial dialogue. Four researchers (i.e. the authors of this paper – from now on Table 5. Results of Fleiss’ Kappa for the reliability of inter- referred to as “experts” for simplicity) were given an introduction rater agreement and for the four dimensions of the coding to the coding scheme and the dialogues. They were also provided scheme. with a coding template and the rules and directions for how to code the dialogues. In particular, they were asked to code each Dimension Fleiss’ Kappa p‐value tutor’s turn for all three dialogues. An excerpt of one of the Level of control 0.404 4.13e‐11 dialogues, for a “high gain” student, is shown in Table 4. The tutor’s turn (highlighted in grey) was coded by experts with Question category 0.395 0 respect to four dimensions that relate to the Level of Support. Level of specificity 0.141 0.0245 Overall, the experts coded 19 tutor turns. When they completed Contingency 0.0764 0.415 the coding process, they participated in a focus group where they discussed their coding, the process of applying the dimensions, and problems or challenges that they faced in doing so. The We also gave the possibility to the coders to provide their results of the coding process and the comments and concerns of comments or the reasoning for their codings while coding the the experts are presented in the next section. dialogues. We further analyzed their free-text comments about Table 4. Excerpt from a tutorial dialogue between a student their coding and additional explanations/justifications that they (high gainer) and a tutor. expressed during a focus-group discussion. Analysis of the free- text answers revealed that experts usually had different opinions Student: 500/55 kg=a m/s^2 about what the goal of the intervention was. In some cases, they Teacher: I don't agree ‐ that's the acceleration that just the pull even stated that most likely the teacher didn’t have a specific goal from the rope would produce (well once the units are but was instead trying to assess the student’s knowledge state. straightened out it would be). Think a little more Frequently, the experts stated that a specific intervention served Student: I'm stuck. I know you have to take into account her multiple goals that related to both backward and forward weight and an additional acceleration to account for the extra functions. As backward, we define the part of the tutor’s response 39N, but I'm not really sure how they fit together. that relates to the student’s prior input and as forward, we define Teacher: All right. What is the general rule for finding the part of the tutor’s response that aims to provide hints, support, guidance to the student towards the correct answer [9]. A dialogue acceleration from forces? excerpt, along with one tutoring expert’s comments about the Student: F/m=a teacher’s turn, is presented in Table 6. Teacher: and what is the F there? Student: tension? Table 6. Example of the expert’s comments during the coding Table 7. Example of the expert’s comments during the coding process with respect to backward/forward functions codes process with respect to question categorization Dialogue Expert's comments Question Dialogue Expert's comments Student: 500/55 kg=a m/s^2 Category First (in response to student answer): Student: 500/55 kg=a Teacher: I don't agree ‐ that's Show student that the answer is m/s^2 the acceleration that just the incorrect by telling him in what It is like a request to pull from the rope would situation it would be correct. "try again". Admittedly produce (well once the units Second (to move forward): Given Teacher: I don't agree what the student is are straightened out it would what the tutor said in "first" get the ‐ that's the trying again is his/her be). Think a little more student to attempt to solve the acceleration that just previous attempt at problem again. the pull from the rope quantification. Note Student: I'm stuck. I know you would produce (well Request/directive the original question is have to take into account her once the units are not quantification ‐ it weight and an additional straightened out it doesn't ask for the acceleration to account for the would be). Think a value of a variable but extra 39N, but I'm not really little more the student interprets sure how they fit together. it that way. First (in response to student answer): Student: I'm stuck. I Teacher: All right. What is the reassures student that it is okay. know you have to general rule for finding Second (to move forward): Get take into account her acceleration from forces? student to think about the correct weight and an answer by going to first principles. additional acceleration to account for the extra Teachers often provided feedback and guidance in one dialogue 39N, but I'm not turn. This caused mismatches in the coding of the four really sure how they dimensions. For example, the experts stated that it was difficult to fit together. assess the Level of control of the teacher’s intervention because doesn't ask what does Teacher: All right. she uses explicit hints to help the student but, at the same time, x mean but does the What is the general she poses an open-ended question. There were similar issues when inverse (inverse of rule for finding Definition assessing the Level of specificity. In that case, experts commented definition). No other acceleration from (question category) that it was hard because the teachers tended to give elaborated forces? seems to fit. feedback on students’ past answers but not further details on future steps. Therefore, it was not easy to decide on the specificity of the teacher’s intervention. Finally, all experts agreed on the 5. Discussion importance of these two dimensions, i.e. Level of control and In this paper we presented a preliminary study on the analysis of Level of specificity but they expressed a need for more precise human-to-human tutorial dialogues. We carried out this analysis instructions for distinguishing between these categories, and as a precursor to developing student modeling for an adaptive applying them. tutorial dialogue system. However, we believe that our approach For the dimension of Question Category, the experts stated that could be informative to teaching and teaching analytics, especially the categories were too numerous (18 categories) and that in some for socio-oriented, constructivist approaches where dialogues cases multiple categories would apply to one intervention or that between teachers and students are considered essential for the none of the existing categories was appropriate for some tutor learning process. turns. We present a related comment from an expert in Table 7. The goal of this analysis was to review existing frameworks for Some experts also expressed doubts about how to apply the coding and analyzing tutorial dialogues and to define a scheme for Contingency dimension. One mentioned that she coded a tutor characterizing the level of support during dialogues—that is, how turn as ‘contingent’ “if the tutor's response reflected the level of does a tutor effectively regulate when, and how much, support to understanding of the student and it mostly did”. provide? Four domain experts coded tutors’ interventions in human-to-human tutorial dialogues and the results of the coding From the discussion that followed the coding process, it was process were presented and discussed in focus group meetings. obvious that the experts were not satisfied with the application of the coding scheme and the results. They stated that it was still From the results, it appears that the crucial factors in defining the unclear to them how teachers were effectively regulating the level levels of support are the amount of new content or new of support during tutorial dialogues and how we can define information shared by the tutor and the degree of detail (or metrics to provide support to teachers during the orchestration of specificity) in the tutor’s help. The experts stated that it is very dialogues. One of the main issues appeared to be the complexity important to differentiate between the information that is offered of the dialogues themselves, as well as the ambiguity of both as feedback to previous questions, the information that relates to students’ and teachers’ interventions. the background knowledge that students may have and the information that is offered as hints or in order to push the student forward. Even though the tutoring experts agreed that contingency between tutor’s turn based on the list of categories we provided them. The the teacher’s and the student’s turns plays an important role, the dimension of “Level of Specificity” was also split into two experts found it difficult to code student-tutor exchanges in terms categories: “Feedback on Correctness” and “Information related of contingency. This may be related to the level of discourse to ‘Feedback on correctness’”. Additionally, we eliminated the analysis, which in this case was very low (i.e. at the exchange dimension of “Contingency” because, at least in this dialogue level, vs. at the episode and dialogue level). It is possible that we corpus, non-contingent tutor turns were rare. need larger sequences of tutor-student turns in order to Currently, we have only carried out trial applications of the appropriately assess contingency. coding scheme in order to refine the dimensions and the coding 5.1 Towards a coding approach for levels. However, the results so far are very encouraging both in terms of inter-rater agreement (Cohen’s kappa for the four characterizing “level of support” dimensions ranged from 0.764 to 0.871) and the experts’ So far, we studied the design and application of a coding scheme comments. Nonetheless, we need to further validate the coding to define and characterize the Level of Support in tutorial scheme, by applying it to more data. dialogues. The coding scheme was designed based on a thorough literature review of related research. The results of applying the 5.2 Limitations of the study scheme revealed some weaknesses of the coding approach and the This study was part of a broader project that aims to enhance an need for more precision in defining and applying some of its adaptive tutorial dialogue system using student modeling dimensions, as presented in the Results section. In light of these techniques. Our goal was to characterize the various levels of findings, we have further revised the proposed coding scheme. support that teachers provide to students during human-to-human The new coding scheme has four dimensions: tutorial dialogues and to identify the factors that affect the provision of support. Towards that end, we have focused on  D1. Information related to the student’s answer: The characteristics of the tutors’ feedback, such as the amount and first dimension refers to the amount and level of specificity of information, the provision of feedback, etc. specificity of the information provided to the student However, there are other factors that were not taken into and that is related to the student’s prior answer. It is consideration in this study. One important factor is the student model that the teacher mentally, dynamically builds and maintains coded on a four-step scale (None, low, medium and for each student. The teacher builds this mental model of the high). student based on the student’s answers. Based on this model, the  D2. Hints provision: The second dimension refers to teacher regulates the dialogue and the level of support, as the the hints that are provided to the student, either directly teacher deems appropriate. The effect that this might have on the or through questions. It is coded on a four step scale teacher’s feedback is demonstrated in Table 8 where we present a (None, low, medium and high) case from our corpus. Based on informal comments about students’ ability level that this tutor expressed to one of the  D3. Feedback on correctness: This dimension refers to authors, we are aware that this tutor perceived student A as an the feedback the tutor provides to the student’s previous underachiever and student B as an overachiever. reply with respect to correctness. Attempts to move Based on the students’ responses, both student A and student B do forward and ambiguous statements (i.e. “but what about not understand the meaning of the net force. However, it is the net force?”) are not considered feedback. In the case evident that the teacher provides more information and support to where the tutor’s turn does not follow a student’s student B than student A. answer, this dimension is coded as non-applicable. This Taking into consideration the effect the teacher’s perception about dimension is coded on a four step scale (None, Implicit, students’ overall ability level might have on the level of support is Explicit and Non-Applicable). an extremely complicated issue that we have not addressed in our  D4. Information related to the “Feedback on coding scheme. However, we acknowledge this is an important correctness”: This dimension refers to the explanation factor that should not be overlooked in developing more adaptive tutorial dialogue systems. or information the tutor provides about her feedback on the correctness of the student’s previous turn. It is Table 8. Two examples of tutorial dialogues that reflect the coded on a three-step scale (Yes – when the tutor tutor’s perception of each student’s overall ability provides an explanation along with feedback; No – Student A (underachiever) ‐ Student B (overachiever)‐ Teacher Dialogue Teacher Dialogue when the tutor provides no explanation along with her feedback; Non-applicable). Student A: a = f / m Student B: 500/55 kg=a m/s^2 T: I don't agree ‐ that's the Relative to the original coding scheme, we split the “Level of acceleration that just the pull control” dimension into two: the “Information related to the from the rope would produce (well once the units are straightened answer” dimension and the “Hint provision” dimension. This was out it would be). Think a little done because providing a hint is considerably different than T: what's f ? more providing information (sometimes the teacher just provides Student B: I'm stuck. I know you general information to describe the context) and thus, these two have to take into account her factors could not be captured by the same dimension. The weight and an additional dimension “Question Category” was eliminated because the acceleration to account for the coding results did not reveal solid relations between different extra 39N, but I'm not really sure categories and the level of support. Moreover, the experts Student A: f = mg how they fit together. mentioned that it was extremely complicated and hard to code the T: All right. What is the general human-to-human tutorial dialogues, we came across several cases T: just mg ? how many forces act rule for finding acceleration from where the teacher would adapt the level of discussion based on ont he climber ? forces? her perception with respect to the student’s level of understanding, Student A: mg + T Student B: F/m=a rather than the actual student’s response. It was evident that teachers provided more information and less hints to low T: is mg down or up? T: and what is the F there? achievers while they were reluctant to give away the answer or Student A: down and T is up Student B: tension? too much information to the high achievers. T: No.. the F in F=ma is always the For example, let us consider two students: Frank is a low net force on the object (or group of objects). The vector sum of all performer who lacks basic knowledge in motion laws and who is the forces on the object. I prefer not confident for his skills in physics. On the contrary, Nancy is a to say "Sum of F= ma" because it's high performer with good background knowledge who enjoys easier to get it right. So.. if she is studying physics. Their teacher has to provide appropriate sliding down and the rope is just feedback taking into account their prior knowledge and personal T: ok so now solve for a again short of breaking, what is the characteristics. Based on our observations of human-to-human plugging in T and mg *net* force on her? tutorial dialogues, in the case of an incorrect student answer, the teacher might want to provide information and explanation to Student A: a = (mg + T) / m Student B: 0 Frank, encouraging him to repeat basic concepts and definitions. T: hmm hmm that was what it For Nancy, the teacher would encourage her to try again and to was in the problem above. Now we are in the case where the rope check her line of reasoning for possible mistakes, without giving breaks at >500N. What's the away the answer. tension in the rope just short of Defining explicit guidelines on what kind of feedback is T: which direction is mg in ? breaking? appropriate for specific student types can assist the teacher in providing personalized student feedback. Furthermore, this set of Application of the coding scheme was carried out by the authors guidelines can be helpful for students in teacher education and of this paper (“experts”). We plan to get input from domain young professionals, who do not yet have the expertise to evaluate experts (i.e. physics teachers) for our coding scheme and formally tutorial dialogues, especially in real time. validate it further. 5.4 Future work 5.3 Dialogue-support mechanisms as teaching This paper presented a preliminary study of the work-in-progress on a project that aims to develop an adaptive dialogue tutoring analytics system. Currently, we are working on the refinement of the coding Our objective is to study the mechanisms driving human-to- scheme for the assessment of Level of Support for tutorial human tutorial dialogue and use this information to create dialogues. So far, we have identified factors that affect the level of algorithms and principles to guide effective, automated tutorial support in dialogues, focusing on computationally tractable dialogue use this information to create mechanisms and rules to dimensions; that is, dimensions that can be captured by automated support effective dialogue orchestration. In our case, we aim to or semi-automated measures and indicators, in order to develop an enhance a dialogue-based intelligent tutor to support adaptive adaptive tutorial dialogue system. interactions. However, this line of research can be used to support teachers in other challenging settings, such as in large classrooms Our primary focus in analyzing and coding “level of support” is to or in distance-learning scenarios, where the need for teaching specify authoring principles for adaptive tutoring systems—that analytics is prominent [8]. In particular, we envision the use of is, rules for how to tailor tutor responses for different levels of dialogue-related indicators to provide feedback to teachers and student understanding— with respect to a given domain, and with recommendations on how to appropriately support their students. respect to specific domain knowledge components. Towards that This can be achieved by creating appropriate visualizations and end, we will also work with teachers. In future work, we will data analytics based on dialogue-related indicators and integrating involve them in implementing a rule-based approach for them into teacher dashboards. For example, we could provide structuring adaptive tutorial dialogues. visual indication of the amount of information a teacher provides to a student or a visualization of the content a teacher contributes 6. REFERENCES to a topic in comparison to the content the student contributes to 1. Vincent Aleven, Franceska Xhakaj, Kenneth Holstein, and the same topic. Bruce M. McLaren. 2016. Developing a Teacher Dashboard For Use with Intelligent Tutoring Systems. Fourth So far, teacher dashboards provide information about the tutor- International Workshop on Teaching Analytics. student interactions that mostly has to do with the number of 2. Douglas G. Altman. 1990. Practical statistics for medical messages students exchange with the automated tutor, or the rate research. CRC press. of exchange [24] (an exception to this is recent work by Aleven et 3. Michelene TH Chi, Stephanie A. Siler, and Heisawn Jeong. al [1]) We can enhance this work by adding content-related or 2004. Can tutors monitor students’ understanding quality-related information, such as what concepts have been accurately? Cognition and instruction 22, 3: 363–387. covered or how well students have elaborated on arguments. We 4. Michelene TH Chi, Stephanie A. Siler, Heisawn Jeong, could also recommend to teachers emphasizing certain aspects of Takashi Yamauchi, and Robert G. Hausmann. 2001. the dialogue, such as, leaving time for student self-reflection or Learning from human tutoring. Cognitive Science 25, 4: providing elaborated information instead of hints or feedback on 471–533. correctness. This can be achieved by defining guidelines on 5. Min Chi, Kurt VanLehn, and Diane Litman. 2010. Do micro- feedback provision with respect to different student types and level tutorial decisions matter: Applying reinforcement different levels of understanding. From our experience analyzing learning to induce pedagogical tutorial tactics. In Intelligent 16. Mitchell J. Nathan and Suyeon Kim. 2009. Regulation of Tutoring Systems, 224–234. teacher elicitations in the mathematics classroom. Cognition 6. Min Chi, Kurt VanLehn, Diane Litman, and Pamela Jordan. and Instruction 27, 2: 91–120. 2011. An evaluation of pedagogical tutorial tactics for a 17. Martin Nystrand, Lawrence L. Wu, Adam Gamoran, Susie natural language tutoring system: A reinforcement learning Zeiser, and Daniel A. Long. 2003. Questions in time: approach. International Journal of Artificial Intelligence in Investigating the structure and dynamics of unfolding Education 21, 1–2: 83–113. classroom discourse. Discourse processes 35, 2: 135–198. 7. Christine Chin. 2006. Classroom interaction in science: 18. Deborah Pino-Pasternak, David Whitebread, and Andrew Teacher questioning and feedback to students’ responses. Tolmie. 2010. A multidimensional analysis of parent–child International journal of science education 28, 11: 1315– interactions during academic tasks and their relationships 1346. with children’s self-regulated learning. Cognition and 8. Irene-Angelica Chounta and Nikolaos Avouris. 2014. Instruction 28, 3: 219–272. Towards the real-time evaluation of collaborative activities: 19. Janneke Eva Pol and others. 2012. Scaffolding in teacher- Integration of an automatic rater of collaboration quality in student interaction: exploring, measuring, promoting and the classroom from the teacher’s perspective. Education and evaluating scaffolding. Information Technologies: 1–21. 20. Barbara Z. Presseisen and Alex Kozulin. 1992. Mediated 9. Mark G. Core and James Allen. 1997. Coding dialogs with Learning–The Contributions of Vygotsky and Feuerstein in the DAMSL annotation scheme. In AAAI fall symposium on Theory and Practice. communicative action in humans and machines, 28–35. 21. Ralph T. Putnam. 1987. Structuring and adjusting content for 10. Arthur C. Graesser and Natalie K. Person. 1994. Question students: A study of live and simulated tutoring of addition. asking during tutoring. American educational research American educational research journal 24, 1: 13–48. journal 31, 1: 104–137. 22. Carla van de Sande and James G. Greeno. 2010. A framing 11. Arthur C. Graesser, Natalie K. Person, and Joseph P. of instructional explanations: Let us explain with you. In Magliano. 1995. Collaborative dialogue patterns in Instructional explanations in the disciplines. Springer, 69– naturalistic one-to-one tutoring. Applied cognitive 82. psychology 9, 6: 495–522. 23. Roland G. Tharp and Ronald Gallimore. 1991. The 12. Sandra Katz and Patricia L. Albacete. 2013. A tutoring Instructional Conversation: Teaching and Learning in Social system that simulates the highly interactive nature of human Activity. Research Report: 2. tutoring. Journal of Educational Psychology 105, 4: 1126. 24. Eleni Voyiatzaki and Nikolaos Avouris. 2014. Support for 13. Sandra Katz, David Allbritton, and John Connelly. 2003. the teacher in technology-enhanced collaborative classroom. Going beyond the problem given: How human tutors use Education and Information Technologies 19, 1: 129–154. post-solution discussions to support transfer. International 25. Lev Vygotsky. 1978. Interaction between learning and Journal of Artificial Intelligence in Education 13, 1: 79–116. development. Readings on the development of children 23, 14. J. Richard Landis and Gary G. Koch. 1977. The 3: 34–41. measurement of observer agreement for categorical data. 26. Beverly Park Woolf. 2010. Building intelligent interactive biometrics: 159–174. tutors: Student-centered strategies for revolutionizing e- 15. Hugh Mehan. 1979. Learning lessons. Harvard University learning. Morgan Kaufmann. Press Cambridge, MA.