=Paper= {{Paper |id=Vol-1618/LBR2 |storemode=property |title= Exploring Contingent Step Decomposition in a Tutorial Dialogue System |pdfUrl=https://ceur-ws.org/Vol-1618/LBR2.pdf |volume=Vol-1618 |authors=Pamela Jordan,Patricia Albacete,Sandra Katz |dblpUrl=https://dblp.org/rec/conf/um/JordanAK16 }} == Exploring Contingent Step Decomposition in a Tutorial Dialogue System== https://ceur-ws.org/Vol-1618/LBR2.pdf
     Exploring Contingent Step Decomposition in a Tutorial
                       Dialogue System

                              Pamela Jordan, Patricia Albacete and Sandra Katz
                                         Learning Research & Development Center
                                                  University of Pittsburgh
                                                     Pittsburgh, USA
                              pjordan@pitt.edu, palbacet@pitt.edu, katz@pitt.edu

ABSTRACT                                                          Problem:

We explore the effectiveness of a simple algorithm for adap-      Suppose'you'aim'a'bow'horizontally,'directly'at'the'center'of'a'target'25.0'm'away'
tively deciding whether to further decompose a step in a line     from'you.'If'the'speed'of'the'arrow'is'60'm/s,'how'far'from'the'center'of'the'target'
                                                                  will'it'strike'the'target?'That'is,'find'the'vertical'displacement'of'the'arrow'while'it'
of reasoning during tutorial dialogue. We compare two ver-        is'in'flight.
sions of a tutorial dialogue system, Rimac: one that always
decomposes a step to its simplest sub-steps and one that          Assume'there'is'no'air'friction.
adaptively decides to decompose a step based on a student’s       Reflection/Question/(RQ):
pre-test assessment. We hypothesize that students using the
two versions of Rimac will learn similarly but that students      Suppose the same archer shoots an identical arrow from the same spot on the
                                                                  cliff. Again he aims the arrow perfectly horizontally with an initial velocity of 60
who use the version that adaptively decomposes a step will        m/s. How does the vertical velocity of the arrow change (remains the same,
learn more efficiently. Our initial results suggest support       increases, decreases)?
for our hypothesis but the sample size for the experiment
is still small and we are continuing to collect more student
                                                                  Figure 1: An example problem and post-problem
interactions with the two versions of the system.
                                                                  reflection question.

CCS Concepts
                                                                  needed to answer a post-problem reflection question (RQ),
•Applied computing → Interactive learning environ-
                                                                  as in Figure 1, and (2) the granularity of that discussion.
ments; •Computing methodologies → Discourse, dia-
                                                                     Similar to Wood’s EXPLAIN, QUADRATIC and DATA
logue and pragmatics;
                                                                  tutors [7], Rimac decides whether to discuss the line of rea-
                                                                  soning (LOR) underlying a correct answer to an RQ and,
Keywords                                                          if so, at what grain size (i.e., it decides whether to decom-
Adaptation; Dialogue; Contingency; Scaffolding; Intelligent       pose a step in a task into simpler sub-steps). And similar to
Tutoring Systems                                                  Wood’s DATA tutor, Rimac bases decomposition decisions
                                                                  on pre-test assessments. Unlike Wood’s tutors, help seek-
                                                                  ing is not left to the learner in that the tutorial dialogue
1.   INTRODUCTION                                                 system and the student are engaged in a discussion of the
  Woods introduced the idea of contingent tutoring in the         line of reasoning (LOR) that leads to the answer to a re-
1970s after analyzing face to face interactions between chil-     flection question and the system always helps the student
dren (the learners) and adults (the tutors) [7]. Instructional    co-construct the next step in the LOR. To help the student
contingency refers to the amount of help or scaffolding the       co-construct the step, Rimac uses hint strategies to elicit
tutor offers the learner based on the student’s current or pre-   the step from the student. If the hint fails, and the student
vious response, while domain contingency refers to the issue      is unable to co-construct the step, then the system either
of what the tutor should focus on next (e.g., what content        offers a more specific hint, decomposes the step further and
in the current task, what the next task should be, what ma-       hints at each of its sub-steps, or simply completes the step
terials to use) and can involve deciding how to decompose         for the student.
a difficult task into potentially easier sub-tasks [7]. There        In this paper we explore an initial, simple algorithm for
are also different ways in which to adapt to a student which      adaptively deciding whether to further decompose a step
have been explored using tutorial dialogue systems, includ-       after it has been successfully co-constructed. We compare
ing adapting to learning style [6] and deciding who should        two versions of Rimac: one that always decomposes success-
cover a step in the tutoring [2]. However, in our current im-     fully co-constructed steps and one that adaptively decides
plementation of a tutorial dialogue system for physics, Ri-       whether to decompose such a step based on students’ pre-
mac [5, 1], we focused on deciding when to decompose a task       test assessment. The reason for decomposing a successfully
for the learner (an aspect of domain contingency), which in-      co-constructed step is that the student may have contributed
cludes: (1) deciding whether to decompose the reasoning           a correct answer using incomplete reasoning or may have
                                                                  simply guessed correctly using intuition and thus it could be
                                                                  beneficial to explicitly cover the underlying reasoning with
                                                                  the student. We hypothesize that if our simple algorithm is
             Figure 2: The Rimac Interface and an example dialogue with short answer questions.


effective then students will learn similarly from using either   ing the TuTalk tutorial dialogue toolkit [4]. Thus the di-
version of the system but that students who use the adaptive     alogues authored for the system can be represented with a
decomposition version will learn more efficiently.               finite state machine. Each state contains a single tutor turn.
   The rationale for the hypothesis follows. First, students     The arcs leaving the state correspond to possible classifica-
using either version of the system can spend as much time        tions of student turns. When creating a state, the dialogue
as they need to complete the assigned problems. If the stu-      author enters the text for a tutor’s turn and defines classes
dent fails to successfully co-construct a decomposable step,     of student responses (e.g. correct, partially correct, incor-
then the system will respond by eliciting its sub-steps. How-    rect). A single student response class is defined by entering
ever, if the student succeeds at co-constructing the step then   a set of semantically similar text phrases that correspond to
the student can progress faster through the RQ. If the deci-     how students might respond. TuTalk’s default understand-
sion algorithm is successful, then the adaptive system should    ing module ranks the response classes defined for the current
enable a significant number of users to complete the prob-       tutor state according to the edit distance of the normalized
lem faster because it will often be accurate in its choice not   words in the actual student response relative to the normal-
to decompose a step after it is successfully co-constructed.     ized words in the text phrases that define each class. It
Furthermore, if a significant number of steps are not decom-     selects the class with the minimum edit distance as the best
posed after a successful co-construction, then less material     classification of the student’s response. However, if the mini-
is explicitly covered with the student. If it is not detrimen-   mum edit distance is greater than a specified threshold, then
tal to have “skipped” explicit mention of this material then     the system classifies the student response as unrecognizable.
learning gains for students who used the adaptive system            Rimac’s dialogues were developed to present a directed
should be similar to learning gains for students who used        line of reasoning, or DLR [3]. During a DLR, the tutor
the non-adaptive system.                                         presents a series of carefully ordered questions to the stu-
   While our initial results suggest support for this hypoth-    dent. If the student answers a question correctly, he ad-
esis, the sample size is still small and we are continuing to    vances to the next question in the DLR. If the student pro-
collect more student interactions with the system.               vides an incorrect answer, the system launches a remedial
                                                                 sub-dialogue and then returns to the main line of reasoning
                                                                 after the sub-dialogue has completed. If the system is un-
2.   RIMAC                                                       able to understand the student’s response then it completes
  Rimac is a web-based natural-language tutoring system          the step for the student. Rimac asks mainly short answer
that engages students in conceptual discussions after they       questions to improve the recognition of student responses as
solve quantitative physics problems [5, 1] and was built us-     shown in Figure 2, which illustrates the system’s follow-up
to correct, partially correct and incorrect answers.                                                            (1) Solve'RQ
   Rimac’s dialogues are structured as hierarchical plan net-
works where a parent node abstracts over its child nodes [8].
                                                                                      (2)'Determine'net'force                      (3)'Determine'vertical
For example, a parent node of “travel to Chicago” may be                                                                               acceleration
decomposed into more detailed child nodes such as “buy an
airplane ticket to Chicago”, “go to the airport”, etc. which
                                                                       (4) Identify'forces        (5) Determine'''    (6) Apply'Newton’s       (7) Compute''vertical
in turn may be decomposed into even more detailed nodes.                                              vertical            Second'Law               acceleration
In the case of tutoring physics, the upper-level parent nodes                                         net'force
represent the problem solving strategy. See Figure 3 for an                                                             (10) Get'definition'
                                                                                                                            of'NSL
example of part of a plan network for one of the Rimac di-                           (8) Apply'              (9) Compute''
                                                                                         definition' of          vertical
alogues we are using in our testing.                                                     net'force               net'force
   The adaptive version of Rimac uses a decision algorithm to
decide whether, after eliciting a parent node, to expand the
                                                                                   (11) Get'definition'
parent node and elicit its child nodes. For this formative                             of'net'force
evaluation of the algorithm, we selected the nodes where
decisions should be made instead of treating each non-leaf
node as a potential decision point.                               Figure 3: Extract of plan network for responding to
   For example, in reference to the plan network in Figure 3,     the RQ in Figure 1.
both example dialogues in Figure 4 first elicit the top child
nodes of “(2) Determine net force” and “(3) Determine verti-      a discussion of the reasoning with the student (i.e., it elicits
cal acceleration” for the parent node “(1) Solve RQ”. Notice      some subset of child nodes). For every decision point a set
that there are further decisions to make concerning how to        of prerequisite KCs have been identified that are expected
elicit each node. When eliciting “(2) Determine net force”        to predict whether the student sufficiently knows the knowl-
the system elicits one of the child nodes “(4) Identify forces”   edge expressed in the child nodes (sub-steps). The student’s
instead of directly eliciting “(2) Determine net force”. For      scores for that set of KCs are evaluated to decide whether
this experiment we left the decision about how to elicit each     or not to decompose the node (step) further.
node to our content specialists and this was static and iden-        Let KCD be the set of KCs associated with decision point
tical across both versions of the system.                         D where KCd ∈ KCD , ai is the score ∈ {1, 0} for a pre-test
   Neither of the child nodes “(2) Determine net force” and       item that tests KCd and n is the number of test items testing
“(3) Determine vertical velocity” is expanded further in the      KCd . Let SD be the set of scores for KCs associated with
dialogue example in Figure 4 (left), which was generated by       decision point D where Sd ∈ SD , Sd is the score for KCd
the adaptive version of the system. Instead, the dialogue         and Sd is defined as:
moves on to elicit a new sibling node not shown in the plan
network. However, in the dialogue example on the right in                                                            n
                                                                                                                     X
Figure 4, the system decides to expand all decomposeable                                                  Sd = 1/n           ai                               (1)
nodes further [i.e., “(2) Determine net force” and “(3) De-                                                           i=1
termine vertical acceleration”]. The decision about whether          Finally, let TD be the score for decision point D where TD
to elicit one node or multiple nodes before expanding those       is defined as:
nodes is again left to the content specialist and is static and
identical across both versions of the system.
   Thus, the dialogue for the adaptive version of the system                                              TD = min(SD )                                       (2)
would range between that shown by the dialogue on the                We consider a student with TD >= .8 as very knowledge-
left in Figure 4, where none of the target parent nodes is        able about the content that could be skipped, TD >= .5
expanded, and that shown by the dialogue on the right where       as having medium-level knowledge, and TD < .5 as having
the algorithm decides to expand every target parent node.         low-level knowledge.
                                                                     The algorithm applied at each decision point D is defined
3.   CONTINGENT STEP DECOMPOSITION                                as:
   In the adaptive version of Rimac that we are testing, we          if RQ node & node correct & TD >= .8 then
use a student model that is initialized with the student’s              do not expand the RQ node
pre-test scores for the knowledge components (KCs) that              else if RQ node & (node incorrect OR TD < .8) then
need to be applied to arrive at the correct answer to the               expand the RQ node
reflection questions presented to students. In future versions       else if node correct & TD >= .5 then
of the system (but not in this current test) we will update             do not expand the node
the student model during the discussions with the tutor in           else if node incorrect OR TD < .5 then
an attempt to reflect students’ learning.                               expand the node
   The adaptive version of the system consults the student           end if
model at every decision point to predict whether the student         To simplify implementation for this experiment, the child
is likely to need the current step decomposed into simpler        nodes of an RQ node have identical KCD and are a subset
steps. Two types of decision points occur: (1) after a re-        of KCD for the RQ node. In future work, we will adjust
flection question (RQ) is answered by the student and (2)         each node’s KCD to more specifically reflect the content of
when it is possible to further decompose a step into sub-         its child nodes.
steps. In the former case the reflection question is the top
node in the plan network and is decomposed by engaging in         4.       METHODS
Plan/                                                                          Plan/
nodes                                                                          nodes
        Tutor/said: Let's'begin'the'same'way'that'we'started'the'                      Tutor/said: Let's'begin'the'same'way'that'we'started'the'
        horizontal'motion'question:'by'stating'the'relevant'forces'on'the'             horizontal'motion'question:'by'stating'the'relevant'forces'on'the'
 1,     arrow.'Recall'that'we'looked'at'the'applied'forces'because'they'       1,      arrow.'Recall'that'we'looked'at'the'applied'forces'because'they'
 2,     can'cause'the'velocity'to'change.                                      2,      can'cause'the'velocity'to'change.
 4      Tutor/said: So,'can'you'please'tell'me'what'the'vertical)forces on'    4       Tutor/said:/So,'can'you'please'tell'me'what'the'vertical'forces'on'
        the'arrow'are?                                                                 the'arrow'are?
        You/said: gravity                                                              You/said:/gravity
        Tutor/said: Very'good.'Since'we'know'that'the'force'of'gravity'is'             Tutor/said:/Sounds'good.'Since'we'know'that'the'force'of'gravity'
        acting'on'the'arrow,'what'does'that'mean'about'the'arrow's'vertical)           is'acting'on'the'arrow,'what'does'that'mean'about'the'arrow's'
 3      acceleration (zero,'nonzero,'etc)?                                     3       vertical'acceleration'(zero,'nonzero,'etc)?
        You/said: nonzero                                                              You'said:'nonzero
        Tutor/said: Okay!'Since'we'know'that'the'vertical'acceleration'of'             Tutor/said:/Excellent.'Again,'let's'make'sure'you'understand'this'
        the'arrow'is'nonzero,'we'can'use'this'fact'to'gather'information'      5,      connection'by'referencing'the'definition'of'net'force.
        about'its'vertical'velocity.                                           8,      Tutor'said:'Can'you'please'tell'me'the'definition'of''net'force'?
                                                                               11      You/said:/sum'of'forces
                                                                                       Tutor/said:/Excellent.'And'knowing'that'gravity'is'the'only'vertical'
                                                                                       force,'what'can'you'say'about'the'vertical'net'force'on'the'arrow'
                                                                               9       (zero,'nonzero,'etc.)?
                                                                                       You/said:/nonzero
                                                                                       Tutor/said:/Yes,'I'agree.'Now'that'we'know'the'vertical'net'force,'
                                                                                       we'can'find'the'vertical'acceleration.'There'is'a'law'that'can'help'
                                                                               3,      us'do'that'by'relating'net'force'and'acceleration.'What'is'that'law?
                                                                               6,      You/said:/f=ma
                                                                               10      Tutor'said:'Very'good.'We'can'use'that'equation'which'is'Newton's '
                                                                                       Second'Law.


Figure 4: The adaptive dialogue on the left represents the non-expanded network in Figure 3 and the adaptive
dialogue on the right represents the fully expanded network (as in the control version of the system).


                                                                               maximum score possible was 19.
   We are testing two versions of the system: one that always
decomposes a target parent node into simpler child nodes                       4.3     Procedure
regardless of the student’s knowledge of the content to be                        On the first day, the teacher gave the on-line pre-test in
discussed and a second one that decomposes target parent                       class and assigned the two dynamics problems. During the
nodes into simpler nodes or not, based on the student’s pre-                   next one to two class days (approximately 90 minutes to-
test performance on items that target the knowledge needed                     tal) and as homework, for each assigned problem students
to answer the RQ correctly. The second (adaptive) version                      solved the problem on paper and then watched a video of
of the system follows the algorithm described in the previous                  a sample, worked-out solution in one of the two versions of
section.                                                                       Rimac and engaged in two to three “reflective dialogues” af-
                                                                               ter each problem-solving video. The videos demonstrated
4.1     Participants                                                           how to solve the problem only (as shown in Figure 2, which
   The initial comparison of the two versions of Rimac was                     displays the end of video snapshot on the left) and did not
conducted within high school physics classes at one school                     offer any conceptual explanations. Hence we do not believe
in the Pittsburgh PA area. The study followed the course                       that the videos contributed to learning gains. Finally, at the
unit on dynamics with a total of 44 students participating.                    next class meeting, the teacher gave the on-line post-test.
Students were randomly assigned to one of the two condi-
tions: the non-adaptive control condition (N= 22), and the                     5.      INITIAL RESULTS
adaptive experimental condition (N=22). We are currently                         We analyzed the data to determine whether students who
collecting data from additional high school physics classes                    interacted with the tutoring system learned, as measured
in four other schools in the Pittsburgh PA area.                               by differences from pre-test to post-test, regardless of their
                                                                               treatment condition (i.e., which version of Rimac they were
4.2     Materials                                                              assigned to use), whether there was a difference in learning
  Students interacted with one of the two versions of Ri-                      gains between conditions and whether there was a difference
mac to discuss the physics conceptual knowledge associated                     in time on task between conditions to complete both prob-
with two quantitative dynamics problems. These problems                        lems and their associated reflection questions and dialogues.
and their associated reflective dialogues (two to three di-
alogues per problem) were developed in consultation with                       5.1     Learning Performance
high school physics teachers.                                                     When comparing differences from pre to post-test using
  An online, automatically scored 19 item, multiple-choice                     a paired samples t-test, for all students combined post-test
pre-test and isomorphic post-test (that is, each question was                  scores were significantly higher than pre-test scores (t(43) =
equivalent to a pre-test question, but with a different cover                  6.305, p < 0.001, d = .805) and post-test scores were signif-
story) was used to measure learning differences in students’                   icantly higher than pre-test scores for students in both the
conceptual understanding of physics from interactions with                     experimental condition (t(21) = 5.881, p < .001, d = 1.017),
the system. Each test item was assigned a grade between                        which adaptively decomposes the highest node in the plan
0 and 1 and scores for each item were totaled so that the                      network or not (depending on students’ pre-test scores) and
selected sub-nodes, and the control condition (t(21) = 3.385,    6.   PRELIMINARY CONCLUSIONS AND FU-
p = .003, d = .6451), which always decomposes those nodes             TURE WORK
that can be decomposed (i.e., all but the leaf nodes) in the
plan network. These results suggest that students in the two        We are exploring the effectiveness of a simple algorithm
conditions learned from both versions of the system.             that decides whether or not to decompose a step in a line of
   When comparing the performance of the students who            reasoning during tutorial dialogue. We developed two ver-
used the control version of the system to the students who       sions of the Rimac system to test its effectiveness: one con-
used the experimental version of the system, using an inde-      trol version that always decomposes a step regardless of the
pendent samples t-test, there were no significant differences    student’s knowledge level on the content involved and one
in the pre to post-test gain (t(42) = .995, p = .325, d =        experimental version that decides whether or not to decom-
.300) nor in the normalized gain (t(42) = 1.226, p = .113,       pose a step based on the student’s knowledge of the content
d = 1.124). Thus, as we hypothesized, the adaptive ver-          involved in the step.
sion of the system was not detrimental to students’ learning,       We found that students who used the experimental (adap-
which suggests that the adaptive version of the system may       tive) version of the system, which incorporates the simple
have been decomposing just the target nodes that students        decision algorithm, learned similarly to those students who
needed to have decomposed.                                       used the control (non-adaptive) version of the system, but
                                                                 that the students who used the experimental version of the
                                                                 system were able to complete the same number of problems
5.2   Efficiency of Learning                                     in less than half the time that it took students who used
                                                                 the control system. This suggests that the algorithm was
   When comparing the time on task of students who used          effective in deciding when a step should be decomposed.
the control version of the system to students who used the          In future work we will continue to analyze the number of
experimental version of the system, using an independent         node decompositions that occur for students who use the
samples t-test, there were significant differences in the time   adaptive system and we will test a version of the system
on task to complete both problems (t(23) = 1.879, p = .037,      in which there are never any decompositions of target nodes
d = .567). The mean time on task for the experimental con-       that are answered correctly to further test the validity of our
dition was 2653.9 seconds (about 44 minutes) and for the         decision algorithm. We will also explore additional adapta-
control condition was 6801.5 seconds (about 1 hour and 53        tions that traverse the plan network in different ways. After
minutes). The average difference in time spent between con-      we have fine-tuned and validated our decision algorithm, we
ditions was about 1 hour and 9 minutes. Thus students in         will explore whether the algorithm will transfer to other tu-
the experimental condition spent significantly less time yet     torial dialogue domains.
learned similar amounts to students in the control condition
in which all target nodes were decomposed. This suggests
that the version of the system used in the experimental con-     7.   ACKNOWLEDGMENTS
dition may have accurately decided to decompose the target          We thank Dennis Lusetich, Svetlana Romanova, and Scott
nodes that individual students needed to have decomposed.        Silliman. This research was supported by the Institute of
                                                                 Education Sciences, U.S. Department of Education, through
                                                                 Grant R305A130441 to the University of Pittsburgh. The
5.3   Additional Measures                                        opinions expressed are those of the authors and do not nec-
   We also explored the frequency with which higher-level        essarily represent the views of the Institute or the U.S. De-
target nodes were actually decomposed by examining TD            partment of Education.
values for all students in the experimental condition for the
second problem. All but 2 of the 22 students needed at           8.   REFERENCES
least 1 target node decomposed. The average number of
decompositions of target nodes was 5.14 with a minimum of        [1] P. Albacete, P. W. Jordan, and S. Katz. Is a
0 and a maximum of 10. Given that most students needed               dialogue-based tutoring system that emulates helpful
some target nodes decomposed, this further suggests that             co-constructed relations during human tutoring
the decision algorithm in the experimental version of the            effective? In 17th International Conference on Artificial
system may have been accurate in its decisions about when            Intelligence in Education, 2015.
to decompose target nodes.                                       [2] M. Chi, K. VanLehn, D. Litman, and P. Jordan.
   In future work, we also need to explore the degree to             Empirically evaluating the application of reinforcement
which the algorithm may be deciding unnecessarily that tar-          learning to the induction of effective and adaptive
get nodes need to be decomposed (e.g., the thresholds need           pedagogical strategies. User Modeling and
to be adjusted or the pre-test is not a good measure for             User-Adapted Interaction, 21(1-2):137–180, 2011.
determining when a node needs to be decomposed). Miss-           [3] M. Evens and J. Michael. One-on-one tutoring by
ing a necessary decomposition is likely to be detrimental to         humans and computers. Psychology Press, 2006.
student learning. We can test for this possibility in future     [4] P. W. Jordan, B. Hall, M. Ringenberg, Y. Cui, and
work by creating another control version of the system that          C. Rosé. Tools for authoring a dialogue agent that
never decomposes a target node when a student is able to             participates in learning studies. In Proceedings of AIED
answer it correctly. If the algorithm is unnecessarily de-           2007, pages 43–50, 2007.
composing nodes infrequently, we hypothesize that students       [5] S. Katz and P. Albacete. A tutoring system that
who use the experimental version of the system will learn            simulates the highly interactive nature of human
significantly more than students who use the new control             tutoring. Journal of Educational Psychology,
version.                                                             105(4):1126, 2013.
[6] A. Latham, K. Crockett, D. McLean, and B. Edmonds.
    Adaptive tutoring in an intelligent conversational agent
    system. In Transactions on Computational Collective
    Intelligence VIII, pages 148–167. Springer, 2012.
[7] D. Wood. Scaffolding, contingent tutoring, and
    computer-supported learning. International Journal of
    Artificial Intelligence in Education, 12(3):280–293,
    2001.
[8] R. M. Young, S. Ware, B. Cassell, and J. Robertson.
    Plans and planning in narrative generation: a review of
    plan-based approaches to the generation of story,
    discourse and interactivity in narratives. SDV. Sprache
    und Datenverarbeitung the Intenational Journal for
    Language Data Processing, 37:41–64, 2014.