=Paper=
{{Paper
|id=Vol-1738/IWTA_2016_paper2
|storemode=property
|title=Analysis of Human-to-Human Tutorial Dialogues: Insights for Teaching Analytics
|pdfUrl=https://ceur-ws.org/Vol-1738/IWTA_2016_paper2.pdf
|volume=Vol-1738
|authors=Irene-Angelica Chounta,Bruce M. McLaren,Patricia Albacete,Pamela Jordan, Sandra Katz
|dblpUrl=https://dblp.org/rec/conf/ectel/ChountaMAJK16
}}
==Analysis of Human-to-Human Tutorial Dialogues: Insights for Teaching Analytics==
<pdf width="1500px">https://ceur-ws.org/Vol-1738/IWTA_2016_paper2.pdf</pdf>
<pre>
            Analysis of Human-to-Human Tutorial Dialogues:
                    Insights for Teaching Analytics
  Irene-Angelica Chounta, Bruce M. McLaren                               Patricia Albacete, Pamela Jordan, Sandra Katz
            Human-Computer Interaction Institute                                 Learning Research and Development Center
               Carnegie Mellon University                                                  University of Pittsburgh
               Pittsburgh PA, 15213, USA                                                 Pittsburgh PA, 15260, USA
          {ichounta,bmclaren}@cs.cmu.edu                                         {palbacet, pjordan, katz}@pitt.edu


ABSTRACT                                                                         How can we define the “level of support” that a successful
In this paper we present a preliminary analysis of human-to-                      tutor gives during tutoring?
human tutorial dialogues as a precursor to developing an adaptive                What makes some help given by a tutor more generous or
tutorial dialogue system, guided by a student model. One of our                   stingy, easy or challenging, straightforward or
main goals is to further understand what makes tutorial dialogue                  “cognitively complex” for students?
successful, in particular how tutorial dialogues adapt to different
student characteristics and prior knowledge and how to provide            Several researchers have addressed these questions, in various
feedback to students in order to further support their practice. In       domains, for various purposes—for example, to develop
particular we aim to identify important factors that affect tutorial      instructional materials for classroom and computer-based learning
dialogues and to characterize the level of support provided to            environments; to address questions about how scaffolding and its
students with different levels of understanding. Our approach and         counterpart, “contingent tutoring”, take place in naturalistic
findings could also inform teaching and teaching analytics.               learning settings in order to guide teacher training; to measure the
                                                                          effectiveness of curricula that implement scaffolding as a key
CCS Concepts                                                              feature. It is not surprising that the diverse set of goals driving the
• Applied computing ➝ Education.                                          quest for ways to operationalize “levels of support” would
                                                                          produce an equally diverse set of descriptive frameworks.
Keywords                                                                  The aim of this paper is thus threefold:
Tutorial dialogues; level of support; student modeling; adaptive
                                                                             1. To explore how teachers regulate discussions and adapt
dialogue; teaching; learning
                                                                             their levels of support during tutorial dialogues,
1. INTRODUCTION                                                              2. To identify the factors that define “level of support” (LOS)
The learning process and its outcome highly depend on the social             in human-to-human tutoring examples, and
interaction between teachers and students and, in particular, on the         3. To propose analytics and mechanisms to guide tutors in
proficient, helpful and focused use of language that students are            orchestrating effective and efficient interventions in adaptive
exposed to by their teachers through written texts or tutoring               tutorial dialogues.
discussions [20]. The findings from this research gave rise to
several methodologies that promote dialogue as a means of                 In the following sections, we present the process of identifying
keeping students engaged and motivated (such as instructional             important factors and constructing a coding scheme to
conversation [23]). This shift to socially-oriented methods was           characterize the level of support in tutorial dialogues. However,
also observed for technology-enhanced learning contexts and for           our aim is not to use this scheme to analyze tutorial dialogues, but
intelligent tutoring systems [26].                                        to provide guidance to dialogue authors for tailoring the level of
                                                                          support to provide to students who exhibit different levels of
Our research goal is to develop an adaptive tutorial dialogue             understanding.
system, guided by a student model. In this paper we present a
preliminary analysis of human-to-human tutorial dialogues as a            2. RELATED WORK
precursor to achieving our main objective. Our approach and               Several researchers who have examined the question, “Why is
findings could also be informative to teaching and amenable to            human tutoring so effective?”, have proposed that this effect is
teaching analytics.                                                       due to the highly interactive nature of human tutoring—in other
A key focus of our project is to further understand what makes            words, the degree to which the student and tutor react to and build
tutorial dialogue successful; in particular, how tutorial dialogues       upon each other’s dialogue moves and perceived understanding.
adapt to different student characteristics and prior knowledge and        This has been called the Interaction Hypothesis (e.g., [4,11,22]).
how to provide feedback to students in order to further support           However, an important line of research carried out over the past
their practice. According to Vygotsky, tutors use their assessment        decade to test this hypothesis has found that it is not how much
of students’ ability to adapt the level of discussion to the student’s    interaction takes place during tutoring that’s important, nor the
“zone of proximal development” (ZPD)—that is, a little bit                granularity of the interaction—for example, whether the student
beyond the student’s current level of understanding about a               and tutor discuss a step towards solving a problem or the sub-
concept, ability to perform a skill, etc. [25]. In particular, we are     steps that lead to that step. Rather, what matters is how well the
interested in the following questions:                                    interaction is carried out—for example, what content is addressed
and how it is addressed, in a particular dialogue context (e.g.,          completion,” “Feature Specification,” “Quantification”) and the
[5,6]).                                                                   question that required long, elaborated responses (“Definition,”
In order to study and analyze the dynamics of dialogue, either in         “Example,” “Comparison”). The remaining categories were:
the classroom or in one-on-one settings, researchers have                 “Interpretation,” “Causal antecedent,” “Goal orientation,”
attempted to identify distinctive features of instructional dialogue      “Instrumental/procedural,”      “Enablement,”      “Judgmental,”
and to define schemes for characterizing “level of support”.              “Assertion,” “Request/directive”. Reasoning and deeper
Although most of the existing coding schemes were developed for           understanding are usually exposed with questions that ask “how”
problem-solving or other task-oriented domains, they may also             or “why” and invite for long, well-elaborated answers [10].
prove relevant for operationalizing “level of support” for                Scaffolding is a dynamic process. The tutor might change levels
conceptually-oriented, reflective dialogues. The diverse set of           of support from one turn to the next and in accordance with the
goals driving the quest for ways to operationalize “levels of             student’s response. The main factor used to characterize the
support” has produced an equally diverse set of descriptive               student’s response to support is correctness: was the student’s
frameworks. However, these schemes can be grouped according               response to the tutor’s question/hint correct, partially correct, or
to the underlying, typically tacit dimensions that their developers       incorrect? Other factors that might influence change of the level
used to differentiate the “levels of support” included in their           of support have also been suggested, such as the difficulty level of
coding schemes. The most common dimensions are the degree of              the subject matter, time available and teachers’ global perception
detail (or specificity) in the tutor’s help and the level of “cognitive   of the student’s ability (e.g. [7])
complexity” in the tutor’s comment, question, or directive. For           Human tutors are obviously unable to carry out detailed and
example, Van de Pol’s approach to measuring scaffolding                   highly accurate diagnoses of student knowledge [21]; their
involves characterizing the teacher’s “level of control” [19]. The        assessments of students’ knowledge deficits are often inaccurate
main dimensions in this scheme are the degree of “openness” or            [3]. However, they nonetheless construct and dynamically update
detail in the requested response, the length of the requested             a normative mental representation of students’ grasp of the
response and the amount of new content that the teacher                   domain content under discussion, as reflected in tutors’ adaptive
introduced during her turn. Van de Pol proposed the measurement           responses to students’ need for scaffolding or remediation ([13]).
of “degree of teacher control” (TDc) on a six-step scale, starting        For example, if a student solves a problem quickly and accurately,
from No Control (TDc0 – when the teacher was not with the                 the tutor will probably challenge the student with some questions
students) to Highest degree of control (TDc5 – when the teacher           that go beyond the current problem’s level of difficulty. On the
provided new content, elicited no response and was providing the          other hand, if a student is struggling, the tutor will go slowly,
students with an explanation of the answer to a particular                perhaps clarifying step by step the knowledge the student seems to
question).                                                                be lacking. As a tutoring session progresses, the tutor will
Other schemes have focused on the distribution of cognitive effort        dynamically update his or her conception of what the student
between the tutor (also teacher or parent) and the student—in             knows and does not know. This allows the tutor to select
particular, who is doing the “heavy lifting” at particular points         appropriate problems to solve (macro–adaptation)—perhaps
during instruction. Pino-Pasternak et al. [18] were interested in         simpler problems if the student has not done well or more
determining if the level of parental mediation impacted students’         challenging ones if the student is performing proficiently. The
self-regulated learning (SRL) —that is, students’ ability to control      tutor’s dialogue with the student also enables the tutor to focus on
and monitor their own learning processes. They found that                 particular curriculum elements (facts, concepts, skills, etc.) to
contingent shifting between mediation levels supported children’s         discuss during a given problem and to determine the appropriate
SRL. This scheme introduces the dimensions of cognitive                   level at which to discuss these elements (micro-adaption).
demand (i.e. the distribution of cognitive effort between the parent      However, dynamically adjusting the level of support according to
and child), the student’s level of understanding and the                  students’ changing understanding of domain concepts is not a
operationalization of “contingent tutoring” in terms of a mediation       trivial task – for humans or intelligent tutors.
level that shifts to meet students’ level of understanding. A similar
approach was proposed by Nathan and Kim [16], who studied the             3. METHODOLOGY
way teachers regulate elicitations with respect to a cognitive            3.1 Core Rationale
hierarchy and in accordance with the correctness of students’             To better understand the mechanics of tutorial discussions, we
responses. Toward that end, they coded teachers elicitations using        studied a corpus of human-to-human tutorial dialogues. In
Mehan’s coding scheme [15].                                               particular, we took two approaches to better understand how
In addition to the aforementioned schemes, Nystrand et.al [17]            tutors vary the “level of support” that they provide to students.
developed an approach to analyzing classroom discourse, focused           The first was a fairly extensive literature review of coding
on identifying factors that increase (or suppress) students’              schemes for tutoring discussions [7,10,11,16–19]. The second was
question-asking and other types of interactions that make up rich         to test the dimensions of level of support that other researchers
discussions, or “dialogic spells.” Their approach includes a              have identified, by coding a corpus of tutorial dialogues. This was
taxonomy that can be used to describe the cognitive complexity            an iterative process of reviewing, coding and evaluating the
(which they call “cognitive level”) of teachers’ questions based on       results that resulted in the creation of a coding scheme for the
the level of abstraction and the status of information that the           level of support in tutorial dialogues.
question invokes (i.e. new vs. old information).
                                                                          3.2 Coding scheme
Graesser et al. [10] focused on classifying the questions asked in        In order to test the accuracy and coverage of the factors that
tutorial dialogues. They defined 18 question categories based on          define “level of support” that we identified through our literature
their content. Furthermore, 8 of the aforementioned categories            review, as summarized in the previous section, we coded tutors’
were further clustered into two subgroups: the questions that             turns in human tutor-student dialogues with respect to these
required a short answer (“Verification,” “Disjunctive,” “Concept          factors (dimensions). In particular, we defined four dimensions:
- Level of Control: we have adapted the dimension of “Degree of               Teacher: The general rule for finding acceleration is F=
Teacher Control” as described in [19] and coded it using a three-             a*m. and this is known as Newton’s Second Law of
step scale (Low, Medium, High) where: Low control signifies that              motion. But here your answer is not correct. Keep in
the teacher provided no new content and/or asked an open-ended                mind that Newton’s second law of motion can be
question, expecting a well-elaborated answer; Medium control                  formally stated as: “The acceleration of an object as
signifies provision of new content that is not directly related to the        produced by a net force is directly proportional to the
question or seeks a short answer; High control signifies that the             magnitude of the net force, in the same direction as the
teacher provides new content or provides a hint or elicits no
                                                                              net force, and inversely proportional to the mass of the
response and instead provides an explanation. Table 1 presents
                                                                              object.” So…?
three examples where the teacher provided feedback at different
levels of control based on her perception with respect to the                 Student: ….
student’s level of understanding of Newton’s Second Law.                 In this example, the student provides a wrong answer on how to
- Question Category: we have used the question categories as             compute acceleration and the teacher replies with feedback
described in the coding scheme of Graesser et. al. [10]. In              regarding the general context. However the teacher does not
particular, we used 18 categories to code the teacher interventions,     provide feedback on why the student’s answer is wrong or what in
for example: verification, disjunctive, concept completion,              particular should be corrected.
verification etc. (presented in Related Work). Some examples on           Table 2. Three examples of Low, Medium and High tutorial
how this dimension was applied to the corpus are presented later                   feedback regarding the Level of Specificity
on (see Table 7).
 Table 1. Three examples of Low, Medium and High tutorial                                                  Level of Support
           feedback regarding the Level of Control                                   Low            Medium                          High

                                Level of Control                         Student                              S:F = ma

             Low           Medium                      High              Teacher                           what's f there?

Teacher                  So what is the net force on her?                Student                                   mg

Student                                39N                                                                              No.. the F in F=ma is always
                                                                                                                        the net force on the object
                                          Which way? We have 500 N
                                                                                                just mg ? how           (or group of objects). The
                                          from the rope pulling up
                                                                         Teacher   just mg?     many forces act         vector sum of all the forces
                       Which              and 539N from her weight
                                                                                                on the climber ?        on the object. I prefer to say
Teacher   direction?   direction? Up      (the gravitational force
                                                                                                                        "Sum of F= ma" because it's
                       or Down?           from the earth) pulling
                                                                                                                        easier to get it right.
                                          down. So what is the
                                          direction of the net force?

                                                                         Examples of the three levels of specificity are shown in Table 2.
- Level of Specificity: this dimension refers to whether the tutor       - Contingency: to code contingency in the tutor’s turn, we adapted
provided specific and focused information to the student. It was         the coding scheme of Pino-Pasternak et. al. [18]. According to this
coded using a 3-step scale as: Low specificity signifies that the        scheme, contingent tutoring takes place when the tutor challenges
tutor does not provide detail or specific information to the student;    the student with questions or comments that are at or above her
Medium specificity signifies that the tutor provides some specific       potential and non-contingent tutoring happens when the tutor
information related to the student’s input but not enough to lead        poses questions and tasks that are lower than the student’s
her to the answer; High specificity signifies that the teacher           potential. This dimension was coded on a binary scale (i.e.
provides detailed feedback to the student, directly related to the       contingent vs. non-contingent). Examples of contingent and non-
issue in question. Specificity is an important factor of instructional   contingent tutorial feedback are presented in Table 3.
dialogue and is usually perceived as an attribute of the
                                                                         Table 3. Examples of continent and non-contingent teacher’s
information content that the teacher provides to the student (for
                                                                                 input for the same Student-Teacher dialogue
example, Van de Pol differentiates between broad, open questions
and detailed ones [19]).                                                                      Contingent                     Non‐Contingent
In our case, this dimension refers to the specificity of the teacher’s   Teacher                   what is the acceleration then?
feedback in relation to the student input that precedes the tutor’s
turn and in this sense, it is different than the Level of Control.       Student                      Fnet=ma, so a = 39/55
This means that a tutor turn could be coded as medium or high for        Teacher     Can you provide the units?         Ok, a = 39 N / 55 kg
Level of control and low for Level of Specificity. For example, let
us consider the following dialogue:
     Teacher: What minimum acceleration must the climber                 3.3 Dataset
     have in order for the rope not to break while she is                We applied our coding scheme to part of a dialogue corpus that
                                                                         stems from previous research to assess the effectiveness of
     rappelling down the cliff?
                                                                         human-guided reflective discussions about physics problems [12].
     Student: the acceleration equals the rope tension                   In particular, the study involved sstudents who were taking an
     divided by the climber’s weight                                     introductory physics course at the University of Pittsburgh; these
                                                                         students were randomly assigned to three conditions: one in which
                                                                         students received reflection questions and interacted with a human
tutor via a chat interface; a second reflection condition in which    Teacher: No.. the F in F=ma is always the net force on the object
students were asked the same set of reflection questions but          (or group of objects). The vector sum of all the forces on the
received a static text explanation as feedback after they responded
                                                                      object. I prefer to say "Sum of F= ma" because it's easier to get it
to these questions; and a third, a control condition in which
                                                                      right. So.. if she is sliding down and the rope is just short of
students were not asked reflection questions but solved more
problems than students in the other two conditions, in order to       breaking, what is the *net* force on her?
control for time on task. From this corpus, we chose three human-     Student: 0
to-human tutorial dialogues on Newton’s Second Law (i.e. “The         Teacher: hmm hmm that was what it was in the problem above.
acceleration of an object as produced by a net force is directly      Now we are in the case where the rope breaks at >500N. What's
proportional to the magnitude of the net force, in the same           the tension in the rope just short of breaking?
direction as the net force, and inversely proportional to the mass
of the object”) from the first condition (students engaging in a      Student: 500N
typed dialogue with his or her tutor via a simple chat interface,     Teacher: Right. that's pulling her which way?
about each reflection question) for further analysis.
The three dialogues were chosen to represent students who
displayed three levels of gain from pretest to posttest: one low,
                                                                      4. Results
                                                                      In order to assess the reliability of agreement between the experts,
one medium and one high. The problem and the reflection
                                                                      we computed the Fleiss’ Kappa for all four coding dimensions of
question for all three dialogues were stated as: “Problem: A rock
                                                                      the scheme. Of course, these are only preliminary results and
climber of mass 55 kg slips while scaling a vertical face.
                                                                      therefore the Fleiss’ Kappa should be considered no more than a
Fortunately, her carabiner holds and she is left hanging at the
                                                                      glimpse at the effectiveness and accuracy of the coding scheme.
bottom of her safety line. Suppose the maximum tension that the
                                                                      The results are displayed in Table 5. The inter-rater agreement can
rope can support is 500 N. Reflection Question: What minimum
                                                                      be interpreted as fair for the dimensions of Level of control and
acceleration must the climber have in order for the rope not to
                                                                      Question category and poor for the Level of specificity [2,14]. For
break while she is rappelling down the cliff? (You do not have to
                                                                      the dimension of Contingency, the result is not statistically
come up with a numerical answer. Just solve for "a" without any
                                                                      significant (p-value > 0.05). However, the inter-rater agreement
substitution of numbers.)”. In effect, this question asked students
                                                                      results and the discussion that followed showed that the suggested
to name the forces that act on the climber and to apply Newton’s
                                                                      coding scheme did not adequately capture the nuances of how
Second Law in order to compute the acceleration.
                                                                      tutors dynamically adapt their responses to student input, during
3.4 Applying the coding scheme to our dataset                         human-to-human tutorial dialogue.
Four researchers (i.e. the authors of this paper – from now on         Table 5. Results of Fleiss’ Kappa for the reliability of inter-
referred to as “experts” for simplicity) were given an introduction    rater agreement and for the four dimensions of the coding
to the coding scheme and the dialogues. They were also provided                                   scheme.
with a coding template and the rules and directions for how to
code the dialogues. In particular, they were asked to code each                    Dimension          Fleiss’ Kappa    p‐value
tutor’s turn for all three dialogues. An excerpt of one of the                  Level of control         0.404        4.13e‐11
dialogues, for a “high gain” student, is shown in Table 4. The
tutor’s turn (highlighted in grey) was coded by experts with                   Question category         0.395            0
respect to four dimensions that relate to the Level of Support.                Level of specificity      0.141         0.0245
Overall, the experts coded 19 tutor turns. When they completed
                                                                                  Contingency            0.0764         0.415
the coding process, they participated in a focus group where they
discussed their coding, the process of applying the dimensions,
and problems or challenges that they faced in doing so. The           We also gave the possibility to the coders to provide their
results of the coding process and the comments and concerns of        comments or the reasoning for their codings while coding the
the experts are presented in the next section.                        dialogues. We further analyzed their free-text comments about
 Table 4. Excerpt from a tutorial dialogue between a student          their coding and additional explanations/justifications that they
                  (high gainer) and a tutor.                          expressed during a focus-group discussion. Analysis of the free-
                                                                      text answers revealed that experts usually had different opinions
Student: 500/55 kg=a m/s^2                                            about what the goal of the intervention was. In some cases, they
Teacher: I don't agree ‐ that's the acceleration that just the pull   even stated that most likely the teacher didn’t have a specific goal
from the rope would produce (well once the units are                  but was instead trying to assess the student’s knowledge state.
straightened out it would be). Think a little more                    Frequently, the experts stated that a specific intervention served
Student: I'm stuck. I know you have to take into account her          multiple goals that related to both backward and forward
weight and an additional acceleration to account for the extra        functions. As backward, we define the part of the tutor’s response
39N, but I'm not really sure how they fit together.                   that relates to the student’s prior input and as forward, we define
Teacher: All right. What is the general rule for finding              the part of the tutor’s response that aims to provide hints, support,
                                                                      guidance to the student towards the correct answer [9]. A dialogue
acceleration from forces?
                                                                      excerpt, along with one tutoring expert’s comments about the
Student: F/m=a                                                        teacher’s turn, is presented in Table 6.
Teacher: and what is the F there?
Student: tension?
Table 6. Example of the expert’s comments during the coding                Table 7. Example of the expert’s comments during the coding
 process with respect to backward/forward functions codes                         process with respect to question categorization

Dialogue                          Expert's comments                                                 Question
                                                                           Dialogue                                     Expert's comments
Student: 500/55 kg=a m/s^2                                                                          Category
                                  First (in response to student answer):   Student: 500/55 kg=a
Teacher: I don't agree ‐ that's   Show student that the answer is          m/s^2
the acceleration that just the    incorrect by telling him in what
                                                                                                                        It is like a request to
pull from the rope would          situation it would be correct.
                                                                                                                        "try again". Admittedly
produce (well once the units      Second (to move forward): Given          Teacher: I don't agree
                                                                                                                        what the student is
are straightened out it would     what the tutor said in "first" get the   ‐ that's the
                                                                                                                        trying again is his/her
be). Think a little more          student to attempt to solve the          acceleration that just
                                                                                                                        previous attempt at
                                  problem again.                           the pull from the rope
                                                                                                                        quantification. Note
Student: I'm stuck. I know you                                             would produce (well      Request/directive
                                                                                                                        the original question is
have to take into account her                                              once the units are
                                                                                                                        not quantification ‐ it
weight and an additional                                                   straightened out it
                                                                                                                        doesn't ask for the
acceleration to account for the                                            would be). Think a
                                                                                                                        value of a variable but
extra 39N, but I'm not really                                              little more
                                                                                                                        the student interprets
sure how they fit together.                                                                                             it that way.
                                  First (in response to student answer):   Student: I'm stuck. I
Teacher: All right. What is the   reassures student that it is okay.       know you have to
general rule for finding          Second (to move forward): Get            take into account her
acceleration from forces?         student to think about the correct       weight and an
                                  answer by going to first principles.     additional
                                                                           acceleration to
                                                                           account for the extra
Teachers often provided feedback and guidance in one dialogue              39N, but I'm not
turn. This caused mismatches in the coding of the four                     really sure how they
dimensions. For example, the experts stated that it was difficult to       fit together.
assess the Level of control of the teacher’s intervention because                                                       doesn't ask what does
                                                                           Teacher: All right.
she uses explicit hints to help the student but, at the same time,                                                      x mean but does the
                                                                           What is the general
she poses an open-ended question. There were similar issues when                                                        inverse (inverse of
                                                                           rule for finding         Definition
assessing the Level of specificity. In that case, experts commented                                                     definition). No other
                                                                           acceleration from
                                                                                                                        (question category)
that it was hard because the teachers tended to give elaborated            forces?
                                                                                                                        seems to fit.
feedback on students’ past answers but not further details on
future steps. Therefore, it was not easy to decide on the specificity
of the teacher’s intervention. Finally, all experts agreed on the          5. Discussion
importance of these two dimensions, i.e. Level of control and              In this paper we presented a preliminary study on the analysis of
Level of specificity but they expressed a need for more precise            human-to-human tutorial dialogues. We carried out this analysis
instructions for distinguishing between these categories, and              as a precursor to developing student modeling for an adaptive
applying them.                                                             tutorial dialogue system. However, we believe that our approach
For the dimension of Question Category, the experts stated that            could be informative to teaching and teaching analytics, especially
the categories were too numerous (18 categories) and that in some          for socio-oriented, constructivist approaches where dialogues
cases multiple categories would apply to one intervention or that          between teachers and students are considered essential for the
none of the existing categories was appropriate for some tutor             learning process.
turns. We present a related comment from an expert in Table 7.             The goal of this analysis was to review existing frameworks for
Some experts also expressed doubts about how to apply the                  coding and analyzing tutorial dialogues and to define a scheme for
Contingency dimension. One mentioned that she coded a tutor                characterizing the level of support during dialogues—that is, how
turn as ‘contingent’ “if the tutor's response reflected the level of       does a tutor effectively regulate when, and how much, support to
understanding of the student and it mostly did”.                           provide? Four domain experts coded tutors’ interventions in
                                                                           human-to-human tutorial dialogues and the results of the coding
From the discussion that followed the coding process, it was
                                                                           process were presented and discussed in focus group meetings.
obvious that the experts were not satisfied with the application of
the coding scheme and the results. They stated that it was still           From the results, it appears that the crucial factors in defining the
unclear to them how teachers were effectively regulating the level         levels of support are the amount of new content or new
of support during tutorial dialogues and how we can define                 information shared by the tutor and the degree of detail (or
metrics to provide support to teachers during the orchestration of         specificity) in the tutor’s help. The experts stated that it is very
dialogues. One of the main issues appeared to be the complexity            important to differentiate between the information that is offered
of the dialogues themselves, as well as the ambiguity of both              as feedback to previous questions, the information that relates to
students’ and teachers’ interventions.                                     the background knowledge that students may have and the
                                                                           information that is offered as hints or in order to push the student
                                                                           forward.
Even though the tutoring experts agreed that contingency between        tutor’s turn based on the list of categories we provided them. The
the teacher’s and the student’s turns plays an important role, the      dimension of “Level of Specificity” was also split into two
experts found it difficult to code student-tutor exchanges in terms     categories: “Feedback on Correctness” and “Information related
of contingency. This may be related to the level of discourse           to ‘Feedback on correctness’”. Additionally, we eliminated the
analysis, which in this case was very low (i.e. at the exchange         dimension of “Contingency” because, at least in this dialogue
level, vs. at the episode and dialogue level). It is possible that we   corpus, non-contingent tutor turns were rare.
need larger sequences of tutor-student turns in order to                Currently, we have only carried out trial applications of the
appropriately assess contingency.                                       coding scheme in order to refine the dimensions and the coding
5.1 Towards a coding approach for                                       levels. However, the results so far are very encouraging both in
                                                                        terms of inter-rater agreement (Cohen’s kappa for the four
characterizing “level of support”                                       dimensions ranged from 0.764 to 0.871) and the experts’
So far, we studied the design and application of a coding scheme        comments. Nonetheless, we need to further validate the coding
to define and characterize the Level of Support in tutorial             scheme, by applying it to more data.
dialogues. The coding scheme was designed based on a thorough
literature review of related research. The results of applying the      5.2 Limitations of the study
scheme revealed some weaknesses of the coding approach and the          This study was part of a broader project that aims to enhance an
need for more precision in defining and applying some of its            adaptive tutorial dialogue system using student modeling
dimensions, as presented in the Results section. In light of these      techniques. Our goal was to characterize the various levels of
findings, we have further revised the proposed coding scheme.           support that teachers provide to students during human-to-human
The new coding scheme has four dimensions:                              tutorial dialogues and to identify the factors that affect the
                                                                        provision of support. Towards that end, we have focused on
         D1. Information related to the student’s answer: The          characteristics of the tutors’ feedback, such as the amount and
          first dimension refers to the amount and level of             specificity of information, the provision of feedback, etc.
          specificity of the information provided to the student        However, there are other factors that were not taken into
          and that is related to the student’s prior answer. It is      consideration in this study. One important factor is the student
                                                                        model that the teacher mentally, dynamically builds and maintains
          coded on a four-step scale (None, low, medium and
                                                                        for each student. The teacher builds this mental model of the
          high).                                                        student based on the student’s answers. Based on this model, the
         D2. Hints provision: The second dimension refers to           teacher regulates the dialogue and the level of support, as the
          the hints that are provided to the student, either directly   teacher deems appropriate. The effect that this might have on the
          or through questions. It is coded on a four step scale        teacher’s feedback is demonstrated in Table 8 where we present a
          (None, low, medium and high)                                  case from our corpus. Based on informal comments about
                                                                        students’ ability level that this tutor expressed to one of the
         D3. Feedback on correctness: This dimension refers to
                                                                        authors, we are aware that this tutor perceived student A as an
          the feedback the tutor provides to the student’s previous     underachiever and student B as an overachiever.
          reply with respect to correctness. Attempts to move
                                                                        Based on the students’ responses, both student A and student B do
          forward and ambiguous statements (i.e. “but what about        not understand the meaning of the net force. However, it is
          the net force?”) are not considered feedback. In the case     evident that the teacher provides more information and support to
          where the tutor’s turn does not follow a student’s            student B than student A.
          answer, this dimension is coded as non-applicable. This       Taking into consideration the effect the teacher’s perception about
          dimension is coded on a four step scale (None, Implicit,      students’ overall ability level might have on the level of support is
          Explicit and Non-Applicable).                                 an extremely complicated issue that we have not addressed in our
         D4. Information related to the “Feedback on                   coding scheme. However, we acknowledge this is an important
          correctness”: This dimension refers to the explanation        factor that should not be overlooked in developing more adaptive
                                                                        tutorial dialogue systems.
          or information the tutor provides about her feedback on
          the correctness of the student’s previous turn. It is           Table 8. Two examples of tutorial dialogues that reflect the
          coded on a three-step scale (Yes – when the tutor                   tutor’s perception of each student’s overall ability
          provides an explanation along with feedback; No –             Student A (underachiever) ‐         Student B (overachiever)‐
                                                                        Teacher Dialogue                    Teacher Dialogue
          when the tutor provides no explanation along with her
          feedback; Non-applicable).                                    Student A: a = f / m                Student B: 500/55 kg=a m/s^2
                                                                                                            T: I don't agree ‐ that's the
Relative to the original coding scheme, we split the “Level of                                              acceleration that just the pull
control” dimension into two: the “Information related to the                                                from the rope would produce (well
                                                                                                            once the units are straightened
answer” dimension and the “Hint provision” dimension. This was
                                                                                                            out it would be). Think a little
done because providing a hint is considerably different than
                                                                        T: what's f ?                       more
providing information (sometimes the teacher just provides                                                  Student B: I'm stuck. I know you
general information to describe the context) and thus, these two                                            have to take into account her
factors could not be captured by the same dimension. The                                                    weight and an additional
dimension “Question Category” was eliminated because the                                                    acceleration to account for the
coding results did not reveal solid relations between different                                             extra 39N, but I'm not really sure
categories and the level of support. Moreover, the experts              Student A: f = mg                   how they fit together.
mentioned that it was extremely complicated and hard to code the
                                    T: All right. What is the general        human-to-human tutorial dialogues, we came across several cases
T: just mg ? how many forces act    rule for finding acceleration from       where the teacher would adapt the level of discussion based on
ont he climber ?                    forces?                                  her perception with respect to the student’s level of understanding,
Student A: mg + T                   Student B: F/m=a                         rather than the actual student’s response. It was evident that
                                                                             teachers provided more information and less hints to low
T: is mg down or up?                T: and what is the F there?              achievers while they were reluctant to give away the answer or
Student A: down and T is up         Student B: tension?                      too much information to the high achievers.
                                    T: No.. the F in F=ma is always the
                                                                             For example, let us consider two students: Frank is a low
                                    net force on the object (or group
                                    of objects). The vector sum of all
                                                                             performer who lacks basic knowledge in motion laws and who is
                                    the forces on the object. I prefer       not confident for his skills in physics. On the contrary, Nancy is a
                                    to say "Sum of F= ma" because it's       high performer with good background knowledge who enjoys
                                    easier to get it right. So.. if she is   studying physics. Their teacher has to provide appropriate
                                    sliding down and the rope is just        feedback taking into account their prior knowledge and personal
T: ok so now solve for a again      short of breaking, what is the           characteristics. Based on our observations of human-to-human
plugging in T and mg                *net* force on her?                      tutorial dialogues, in the case of an incorrect student answer, the
                                                                             teacher might want to provide information and explanation to
Student A: a = (mg + T) / m         Student B: 0
                                                                             Frank, encouraging him to repeat basic concepts and definitions.
                                    T: hmm hmm that was what it
                                                                             For Nancy, the teacher would encourage her to try again and to
                                    was in the problem above. Now
                                    we are in the case where the rope
                                                                             check her line of reasoning for possible mistakes, without giving
                                    breaks at >500N. What's the              away the answer.
                                    tension in the rope just short of        Defining explicit guidelines on what kind of feedback is
T: which direction is mg in ?       breaking?                                appropriate for specific student types can assist the teacher in
                                                                             providing personalized student feedback. Furthermore, this set of
Application of the coding scheme was carried out by the authors              guidelines can be helpful for students in teacher education and
of this paper (“experts”). We plan to get input from domain                  young professionals, who do not yet have the expertise to evaluate
experts (i.e. physics teachers) for our coding scheme and formally           tutorial dialogues, especially in real time.
validate it further.                                                         5.4 Future work
5.3 Dialogue-support mechanisms as teaching                                  This paper presented a preliminary study of the work-in-progress
                                                                             on a project that aims to develop an adaptive dialogue tutoring
analytics                                                                    system. Currently, we are working on the refinement of the coding
Our objective is to study the mechanisms driving human-to-                   scheme for the assessment of Level of Support for tutorial
human tutorial dialogue and use this information to create                   dialogues. So far, we have identified factors that affect the level of
algorithms and principles to guide effective, automated tutorial             support in dialogues, focusing on computationally tractable
dialogue use this information to create mechanisms and rules to              dimensions; that is, dimensions that can be captured by automated
support effective dialogue orchestration. In our case, we aim to             or semi-automated measures and indicators, in order to develop an
enhance a dialogue-based intelligent tutor to support adaptive               adaptive tutorial dialogue system.
interactions. However, this line of research can be used to support
teachers in other challenging settings, such as in large classrooms          Our primary focus in analyzing and coding “level of support” is to
or in distance-learning scenarios, where the need for teaching               specify authoring principles for adaptive tutoring systems—that
analytics is prominent [8]. In particular, we envision the use of            is, rules for how to tailor tutor responses for different levels of
dialogue-related indicators to provide feedback to teachers and              student understanding— with respect to a given domain, and with
recommendations on how to appropriately support their students.              respect to specific domain knowledge components. Towards that
This can be achieved by creating appropriate visualizations and              end, we will also work with teachers. In future work, we will
data analytics based on dialogue-related indicators and integrating          involve them in implementing a rule-based approach for
them into teacher dashboards. For example, we could provide                  structuring adaptive tutorial dialogues.
visual indication of the amount of information a teacher provides
to a student or a visualization of the content a teacher contributes         6. REFERENCES
to a topic in comparison to the content the student contributes to           1.   Vincent Aleven, Franceska Xhakaj, Kenneth Holstein, and
the same topic.                                                                   Bruce M. McLaren. 2016. Developing a Teacher Dashboard
                                                                                  For Use with Intelligent Tutoring Systems. Fourth
So far, teacher dashboards provide information about the tutor-                   International Workshop on Teaching Analytics.
student interactions that mostly has to do with the number of                2.   Douglas G. Altman. 1990. Practical statistics for medical
messages students exchange with the automated tutor, or the rate                  research. CRC press.
of exchange [24] (an exception to this is recent work by Aleven et           3.   Michelene TH Chi, Stephanie A. Siler, and Heisawn Jeong.
al [1]) We can enhance this work by adding content-related or                     2004. Can tutors monitor students’ understanding
quality-related information, such as what concepts have been                      accurately? Cognition and instruction 22, 3: 363–387.
covered or how well students have elaborated on arguments. We                4.   Michelene TH Chi, Stephanie A. Siler, Heisawn Jeong,
could also recommend to teachers emphasizing certain aspects of                   Takashi Yamauchi, and Robert G. Hausmann. 2001.
the dialogue, such as, leaving time for student self-reflection or                Learning from human tutoring. Cognitive Science 25, 4:
providing elaborated information instead of hints or feedback on                  471–533.
correctness. This can be achieved by defining guidelines on                  5.   Min Chi, Kurt VanLehn, and Diane Litman. 2010. Do micro-
feedback provision with respect to different student types and                    level tutorial decisions matter: Applying reinforcement
different levels of understanding. From our experience analyzing
      learning to induce pedagogical tutorial tactics. In Intelligent   16. Mitchell J. Nathan and Suyeon Kim. 2009. Regulation of
      Tutoring Systems, 224–234.                                            teacher elicitations in the mathematics classroom. Cognition
6.    Min Chi, Kurt VanLehn, Diane Litman, and Pamela Jordan.               and Instruction 27, 2: 91–120.
      2011. An evaluation of pedagogical tutorial tactics for a         17. Martin Nystrand, Lawrence L. Wu, Adam Gamoran, Susie
      natural language tutoring system: A reinforcement learning            Zeiser, and Daniel A. Long. 2003. Questions in time:
      approach. International Journal of Artificial Intelligence in         Investigating the structure and dynamics of unfolding
      Education 21, 1–2: 83–113.                                            classroom discourse. Discourse processes 35, 2: 135–198.
7.    Christine Chin. 2006. Classroom interaction in science:           18. Deborah Pino-Pasternak, David Whitebread, and Andrew
      Teacher questioning and feedback to students’ responses.              Tolmie. 2010. A multidimensional analysis of parent–child
      International journal of science education 28, 11: 1315–              interactions during academic tasks and their relationships
      1346.                                                                 with children’s self-regulated learning. Cognition and
8.    Irene-Angelica Chounta and Nikolaos Avouris. 2014.                    Instruction 28, 3: 219–272.
      Towards the real-time evaluation of collaborative activities:     19. Janneke Eva Pol and others. 2012. Scaffolding in teacher-
      Integration of an automatic rater of collaboration quality in         student interaction: exploring, measuring, promoting and
      the classroom from the teacher’s perspective. Education and           evaluating scaffolding.
      Information Technologies: 1–21.                                   20. Barbara Z. Presseisen and Alex Kozulin. 1992. Mediated
9.    Mark G. Core and James Allen. 1997. Coding dialogs with               Learning–The Contributions of Vygotsky and Feuerstein in
      the DAMSL annotation scheme. In AAAI fall symposium on                Theory and Practice.
      communicative action in humans and machines, 28–35.               21. Ralph T. Putnam. 1987. Structuring and adjusting content for
10.   Arthur C. Graesser and Natalie K. Person. 1994. Question              students: A study of live and simulated tutoring of addition.
      asking during tutoring. American educational research                 American educational research journal 24, 1: 13–48.
      journal 31, 1: 104–137.                                           22. Carla van de Sande and James G. Greeno. 2010. A framing
11.   Arthur C. Graesser, Natalie K. Person, and Joseph P.                  of instructional explanations: Let us explain with you. In
      Magliano. 1995. Collaborative dialogue patterns in                    Instructional explanations in the disciplines. Springer, 69–
      naturalistic one-to-one tutoring. Applied cognitive                   82.
      psychology 9, 6: 495–522.                                         23. Roland G. Tharp and Ronald Gallimore. 1991. The
12.   Sandra Katz and Patricia L. Albacete. 2013. A tutoring                Instructional Conversation: Teaching and Learning in Social
      system that simulates the highly interactive nature of human          Activity. Research Report: 2.
      tutoring. Journal of Educational Psychology 105, 4: 1126.         24. Eleni Voyiatzaki and Nikolaos Avouris. 2014. Support for
13.   Sandra Katz, David Allbritton, and John Connelly. 2003.               the teacher in technology-enhanced collaborative classroom.
      Going beyond the problem given: How human tutors use                  Education and Information Technologies 19, 1: 129–154.
      post-solution discussions to support transfer. International      25. Lev Vygotsky. 1978. Interaction between learning and
      Journal of Artificial Intelligence in Education 13, 1: 79–116.        development. Readings on the development of children 23,
14.   J. Richard Landis and Gary G. Koch. 1977. The                         3: 34–41.
      measurement of observer agreement for categorical data.           26. Beverly Park Woolf. 2010. Building intelligent interactive
      biometrics: 159–174.                                                  tutors: Student-centered strategies for revolutionizing e-
15.   Hugh Mehan. 1979. Learning lessons. Harvard University                learning. Morgan Kaufmann.
      Press Cambridge, MA.

</pre>