=Paper= {{Paper |id=Vol-1446/GEDM_2015_Submission_3 |storemode=property |title=Using the Hint Factory to Compare Model-Based Tutoring Systems |pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_3.pdf |volume=Vol-1446 |dblpUrl=https://dblp.org/rec/conf/edm/LynchPCB15 }} ==Using the Hint Factory to Compare Model-Based Tutoring Systems== https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_3.pdf
                         Using the Hint Factory to
                   Compare Model-Based Tutoring Systems

                                     Collin Lynch                   Thomas W. Price
                                 North Carolina State               North Carolina State
                                      University                         University
                                   890 Oval Drive                     890 Oval Drive
                                 Raleigh, NC 27695                  Raleigh, NC 27695
                                cflynch@ncsu.edu                   twprice@ncsu.edu
                                      Min Chi                        Tiffany Barnes
                                 North Carolina State               North Carolina State
                                      University                         University
                                   890 Oval Drive                     890 Oval Drive
                                 Raleigh, NC 27695                  Raleigh, NC 27695
                                  mchi@ncsu.edu                  tmbarnes@ncsu.edu

ABSTRACT                                                        Model-based tutoring systems are based upon classical ex-
Model-based tutoring systems are driven by an abstract do-      pert systems, which represent relevant domain knowledge
main model and solver that is used for solution validation      via static rule bases or sets of constraints [9]. These knowl-
and student guidance. Such models are robust but costly         edge bases are generally designed by domain experts or with
to produce and are not always adaptive to specific students’    their active involvement. They are then paired with classical
needs. Data-driven methods such as the Hint Factory are         search algorithms or heuristic satisfaction algorithms to au-
comparatively cheaper and can be used to generate indi-         tomatically solve domain problems, identify errors in student
vidualized hints without a complete domain model. In this       solutions, and to provide pedagogical guidance. The goal of
paper we explore the application of data-driven hint analy-     the design process is to produce expert models that give the
sis of the type used in the Hint Factory to existing model-     same procedural advice as a human expert. Classical model-
based systems. We present an analysis of two probability        based tutors have been quite successful in field trials, with
tutors Andes and Pyrenees. The former allows for flexible       systems such as the ACT Programming Tutor helping stu-
problem-solving while the latter scaffolds students’ solution   dents achieve almost two standard deviations higher than
path. We argue that the state-space analysis can be used to     those receiving conventional instruction [5].
better understand students’ problem-solving strategies and
can be used to highlight the impact of different design de-     Data-driven hint generation methods such as those used in
cisions. We also demonstrate the potential for data-driven      Hint Factory [17] take a different approach. Rather than us-
hint generation across systems.                                 ing a strong domain model to generate a-priori advice, data-
                                                                driven systems examine prior student solution attempts to
                                                                identify likely paths and common errors. This prior data
1.   INTRODUCTION                                               can then be used to provide guidance by directing students
Developers of model-based tutoring systems draw on do-          towards successful paths and away from likely pitfalls. In
main experts to develop ideal models for student guidance.      contrast to the expert systems approach, these models are
Studies of such systems have traditionally been focused on      primarily guided not by what experts consider to be ideal
their overall impact on students’ performance and not on the    but by what students do.
students’ user-system interaction. The Hint Factory, by con-
trast, takes a data-driven approach to extract advice based     Model-based systems such as Andes [18] are advantageous
upon students’ problem solving paths. In this paper we will     as they can provide appropriate procedural guidance to stu-
apply the Hint Factory analytically to evaluate the impact      dents at any point in the process. Such models can also be
of user interface changes and solution constraints between      designed to reinforce key meta-cognitive concepts and ex-
two closely-related tutoring systems for probability.           plicit solution strategies [4]. They can also scale up rapidly
                                                                to include new problems or even new domain concepts which
                                                                can be incorporated into the existing system and will be
                                                                available to all future users. Rich domain models, however,
                                                                are comparatively expensive to construct and require the
                                                                long-term involvement of domain experts to design and eval-
                                                                uate them.

                                                                Data-driven methods for generating feedback, by contrast,
                                                                require much lower initial investment and can readily adapt
to individual student behaviors. Systems such as the Hint
Factory are designed to extract solutions from prior student
data, to evaluate the quality of those solutions, and to com-
pile solution-specific hints [17]. While this avoids the need
for a strong domain model, it is limited to the space of so-
lutions explored by prior students. In order to incorporate
new problems or concepts it is necessary to collect additional
data. Additionally, such methods are not generally designed
to incorporate or reinforce higher-level solution strategies.

We believe that both of these approaches have inherent ad-
vantages and are not necessarily mutually exclusive. Our
goal in this paper is to explore what potential data-driven
methods have to inform and augment model-based systems.
We argue that data-driven methods can be used to: (1)
evaluate the differences between closely-related systems; (2)
assess the impact of specific design decisions made in those     Figure 1: The Andes user interface showing the
systems for user behaviors; and (3) evaluate the potential ap-   problem statement window with workspace on the
plication of data-driven hint generation across systems. To      upper left hand side, the variable and equation win-
that end we will survey relevant prior work on model-based       dows on the right hand side, and the dialogue win-
and data-driven tutoring. We will describe two closely-          dow on the lower left.
related tutoring systems and data collected from them. We
will then present a series of analyses using state-based meth-
ods and discuss the conclusions that we drew from them.
                                                                 is associated with a pre-compiled solution graph that defines
                                                                 the set of possible solutions and problem-solving steps. The
2.    BACKGROUND                                                 system uses a principle-driven automated problem solver to
                                                                 compile these graphs and to identify the complete solution
2.1   Model-Based Tutoring                                       paths. The solver is designed to implement the Target Vari-
Model-based tutoring systems take a classical expert-systems     able Strategy (TVS), a backward-chaining problem solving
approach to tutoring. They are typically based upon a            strategy that proceeds from a goal variable (in this case
strong domain model composed of declarative rules and facts      the answer to the problem) via principle applications to the
representing domain principles and problem-solving actions       given information. The TVS was designed with the help of
coupled with an automatic problem solver. This knowledge         domain experts and guides solvers to define basic solution
base is used to structure domain knowledge, define individ-      information (e.g. given variables) and then to proceed from
ual problems, evaluate candidate solutions, and to provide       the goal variable and use principles to define it in terms of
student guidance. Novices typically interact with the system     the given variables.
through problem solving with the system providing solution
validation, automatic feedback, pedagogical guidance, and        Students working with Andes use a multi-modal user inter-
additional problem-solving tasks. The Sherlock 2 system, for     face to write equations, define variables and engage in other
example, was designed to teach avionics technicians about        atomic problem-solving steps. A screenshot of the Andes UI
appropriate diagnostic procedures [11]. The system relies on     can be seen in Figure 1. Andes allows students to solve prob-
a domain model that represents the avionics devices being        lems flexibly, completing steps in any order so long as they
tested, the behavior of the test equipment, and rules about      are valid [20]. A step is considered to be valid if it matches
expert diagnostic methods. Sherlock 2 uses these models          one or more entries in the saved solution paths and all nec-
to pose dynamic challenges to problem solvers, to simulate       essary prerequisites have been completed. Invalid steps are
responses to their actions, and to provide solution guidance.    marked in red, but no other immediate feedback is given.
                                                                 Andes does not force students to delete or fix incorrect en-
Andes [19, 18, 20] and Pyrenees [4] are closely-related model-   tries as they do not affect the solution process. In addi-
driven ITSs in the domains of physics and probability. They      tion to validating entries, the Andes system also uses the
were originally developed at the University of Pittsburgh un-    precompiled solution graphs to provide procedural guidance
der the Direction of Dr. Kurt VanLehn. Like other model-         (next-step-help). When students request help, the system
based systems, they rely on a rule-based domain model and        will map their work to the saved solution paths. It will then
automatic problem solvers that treat the domain rules as         select the most complete solution and prompt them to work
problem-solving steps. They distinguish between higher-          on the next available step.
level domain concepts such as Bayes’ Rule, and atomic steps
such as variable definitions. Principles are defined by a cen-   One of the original goals of the Andes system was to de-
tral equation (e.g. p(A|B) = (p(B|A) ∗ p(A))/p(B)) and           velop a tutor that operated as an “intelligent worksheet.”
encapsulate a set of atomic problem-solving steps such as        The system was designed to give students the freedom to
writing the equation and defining the variables within it.       solve problems in any order and to apply their preferred
                                                                 solution strategy. The system extends this freedom by al-
The systems are designed to function as homework-helpers,        lowing invalid steps in an otherwise valid solution and by
with students logging into the system and being assigned or      allowing students to make additional correct steps that do
selecting one of a set of predefined problems. Each problem      not advance the solution state or are drawn from multiple
                                                                  state of a student’s partial solution at some point during the
                                                                  problem solving process, and each edge represents an action
                                                                  that takes the student from one state to another. A complete
                                                                  solution is represented as a path from the initial state to a
                                                                  goal state. Each state in the interaction network is assigned
                                                                  a weight via a value-iteration algorithm. A new student re-
                                                                  questing a hint is matched to a previously observed state and
                                                                  given context-sensitive advice. If, for example, the student is
                                                                  working on a problem that requires Bayes’ Rule and has al-
                                                                  ready defined p(A), p(B), and p(B|A) then the Hint Factory
                                                                  would first prompt them to consider defining p(A|B), then
                                                                  it would point them to Bayes Rule, before finally showing
                                                                  them the equation p(A|B) = (p(B|A) ∗ p(A))/p(B).

                                                                  These hints are incorporated into existing tutoring systems
Figure 2: The Pyrenees user interface showing the                 in the form of a lookup table that provides state-specific
problem statement at the top, the variable and equa-              advice. When a user asks for help the tutor will match their
tion lists on the left, and the tutor interaction win-            current state to an index state in the lookup table and will
dow with calculator on the lower right.                           prompt them to take the action that will lead them to the
                                                                  highest value neighboring state. If their current state is not
                                                                  found then the tutor will look for a known prior state or
solution paths. This was motivated in part by a desire to         will give up. The Hint Factory has been applied successfully
make the system work in many different educational con-           in a number of domains including logic proofs [17], data
texts where instructors have their own preferred methods          structures [8], and programming [15, 10, 13]. Researchers
[20]. The designers of Andes also consciously chose only to       have also explored other related methods for providing data-
provide advice upon demand when the students would be             driven hints. These include alternative state representations
most willing to accept it. For the students however, par-         [13], path construction algorithms [16, 14], and example-
ticularly those with poor problem-solving skills, this passive    based model-construction [12].
guidance and comparative freedom can be problematic as it
does not force them to adhere to a strategy.                      The primary goal of the Hint Factory is to leverage prior
                                                                  data to provide optimal state-specific advice. By calculating
This problem motivated the development of Pyrenees. Pyre-         advice on a per-state basis, the system is able to adapt to
nees, like Andes acts as a homework helper and supports stu-      students’ specific needs by taking into account both their
dents with on-demand procedural and remediation help. It          current state and the paths that they can take to reach the
uses an isomorphic domain model with the same principles,         goal. As a consequence the authors of the Hint Factory
basic steps, problems, and solution paths. Unlike Andes,          argue that this advice is more likely to be in the students’
however, Pyrenees forces students to applying the target-         Zone of Proximal Development and thus more responsive to
variable-strategy during problem solving. It also requires        their needs than a less-sensitive algorithm.
them to repair incorrect entries immediately before moving
on. Students are guided through the solution process with         3.    METHODS
a menu-driven interface, shown in Figure 2. At each step,         In order to investigate the application of data-driven meth-
the system asks students what they want to work on next           ods to model-based tutoring systems, we collected data from
and permits them to make any valid step that is consistent        two studies conducted with Andes and Pyrenees in the do-
with the TVS. Chi and VanLehn [3] conducted a study of the        main of probability. We then transformed these datasets
two systems and found that scaffolding the TVS in Pyrenees        into interaction networks, consisting of states linked with
helped to eliminate the gap between high and low learners.        actions. We used this representation to perform a variety of
This effect was observed both in the original domain where        quantitative and qualitative analyses with the goal of eval-
it was taught (in their case probability) and it transferred to   uating the differences between the two systems and the im-
a new domain (physics), where students used Andes alone.          pact of the specific design decisions that were made in each.

2.2    Data-Extraction and                                        3.1    The Andes and Pyrenees Datasets
       Data-Driven Tutoring.                                      The Andes dataset was drawn from an experiment con-
One of the longstanding goals of educational data-miners is       ducted at the University of Pittsburgh [3]. This study was
to support the development of data-driven tutoring systems.       designed to assess the differential impact of instruction in
Such systems use past student data to structure pedagogical       Andes and Pyrenees on students’ meta-cognitive and problem-
and domain knowledge, administer conceptual and pedagog-          solving skills. Participants in this study were college under-
ical advice, or evaluate student performance and needs. A         graduates who were required to have taken high-school level
number of attempts have been made to address these goals.         algebra and physics but not to have taken a course in prob-
One of the most successful data-driven systems is the Hint        ability or statistics. The participants were volunteers and
Factory [1, 2, 17]. The Hint Factory takes an MDP-based           were paid by time not performance.
approach to hint generation. It takes as input a set of prior
student logs for a given problem, represented as a network of     Forty-four students completed the entire study. However
interactions [6, 7]. Each vertex in this network represents the   for the purposes of the present analysis, we drew on all
66 students who completed at least one problem in Andes-         tions, this representation uniquely identifies any equation
Probability. This is consistent with prior uses of the Hint      for a given problem. Because we used the same state rep-
Factory which draw from all students including those who         resentation for both tutors, we were able to compare states
did not complete the problem. The Pyrenees-Probability           directly across tutors.
logs from this study were not used due to problems with the
data format that prevented us from completing our analysis.      Additionally, we opted to ignore incorrect entries. Pyrenees
From this dataset we drew 394 problem attempts covering          prevents students from applying the principles of probabil-
11 problems. The average number of steps required to solve       ity improperly and forces them to correct any mistakes made
the problems was 17.6. For each problem we analyzed be-          immediately therefore any errors in the student logs are im-
tween 25 and 72 problem attempts, with an average of 35.8        mediately removed making the paths uninformative. Andes,
attempts per problem. Some attempts were from the same           by contrast, gives students free reign when writing equations
student, with at most two successful attempts per student.       and making other entries. This freedom resulted in syntac-
Over all problems, 81.7% of the attempts were successful,        tic errors and improper rule application errors arising in our
with the remainder being incomplete attempts.                    dataset. The meaning of these invalid equations is inherently
                                                                 ambiguous and therefore difficult to incorporate into a state
The Pyrenees dataset was drawn from a study of 137 stu-          definition. However such errors are immediately flagged by
dents conducted in the 200-level Discrete Mathematics course     the system and may be ignored by the student without con-
in the Department of Computer Science at North Carolina          sequence as they do not affect the answer validity therefore
State University. This study used the same probability text-     they may be safely ignored as well.
book and pre-training materials as those used in the Andes
study. The students used Pyrenees as part of a homework          An edge, or action, in our network represents the correct
assignment, in which they completed 12 problems using the        application of a rule or a correct variable definition and leads
tutoring system. One of these problems was not represented       to a transition from one state to another. For the present
in the Andes dataset. We therefore excluded it from our          dataset and state representation, the possible actions were
analysis, leaving 11 shared problems.                            the definition or deletion of variables or equations. Each of
                                                                 these actions was possible in both tutors.
Unlike the Andes students, however, the Pyrenees students
were not always required to solve every problem. In this         4.    ANALYSIS
study the system was configured to randomly select some          In order to develop a broader understanding of our datasets,
problems or problem steps to present as worked examples          we first visualized the interaction network for each problem
rather than as steps to be completed. In order to ensure         as a weighted, directed graph. We included attempts from
that the results were equivalent we excluded the problem-        both Andes and Pyrenees in the network, and weighted the
level worked examples and any attempt with a step-level          edges and verticies by the frequency with which it appeared
worked example from our analysis. As a consequence, each         in the logs. We annotated each state and edge with the
problem included a different subset of these students. For       weight contributed by each tutor. Two examples of these
each problem we analyzed between 83 and 102 problem at-          graphs are given in Figure 3.
tempts, with an average of 90.8 attempts per problem. Some
attempts were from the same student, with at most one suc-       Throughout this section, we will use these graphs to address
cessful attempt per student. Over all problems, 83.4% of         the points we outlined at the end of Section 1. We begin
the attempts were successful.                                    with a case study from one problem and will explore the
                                                                 student problem solving strategies using our graph repre-
3.2   State and Action Representations                           sentation. We will then compare the Andes and Pyrenees
In order to compare the data from both tutors, we repre-         Systems with a variety of metrics based on this represen-
sented each problem as an interaction network, a represen-       tation. We will relate our observations back to the design
tation used originally in the Hint Factory [7]. In the net-      decisions of each system and identify evidence that may sup-
work a vertex, or state, represents the sum total of a stu-      port or question these decisions. Finally, we will show how
dents’ current problem solving steps at a given time during      the analysis methods associated with data-driven hint gen-
a problem-solving attempt. Because Andes permits flexible        eration can be used to validate some of these findings.
step ordering while Pyrenees does not, we chose to represent
the problem solving state st as the set of valid variables and   4.1    Case Study: Problem Ex242
equations defined by the student at time t.                      The graphical representation of a problem is very helpful for
                                                                 giving a high-level overview of a problem and performing
A variable is a probabilistic expression, such as P (A ∪ B),     qualitative analysis. Problem Ex242, shown in Figure 3,
that the student has identified as important to solving the      presents an interesting scenario for a number of reasons. The
problem, for which the probability is known or sought. An        problem was the 10th in a series of 12 practice problems, and
equation represents the application of a principle of proba-     asked the following:
bility, which relates the values of defined variables, such as
the Complement Theorem, P (A) + P (¬A) = 1. Because
such equations can be written in many algebraically equiv-             Events A, B and C are mutually exclusive and
alent ways, we represent each equation as a 2-tuple, con-              exhaustive events with p(A) = 0.2 and p(B) =
sisting of the set of variables included in the equation (e.g.         0.3. For an event D, we know p(D|A) = 0.04,
{P (A), P (¬A)}) and the principle being applied (e.g. Com-            p(D|B) = 0.03, and p(C|D) = 0.3. Determine
plement Theorem). Because we only represent valid equa-                p(B|D).
                                                               clude them as they represent a fair proportion of the Andes
                                                               data. For instance, 62 of the 126 Andes states for Ex242
                                                               were singleton states.

                                                               Interestingly, while there were small variations among their
                                                               solutions, all of the Andes students choose to apply Bayes’
                                                               Rule rather than relying solely on the Conditional Probabil-
                                                               ity Theorem as suggested by Pyrenees. This, coupled with
                                                               the strong proportion of Pyrenees students who also chose
                                                               the Bayes’ Rule solution, indicates that the solution offered
                                                               by Pyrenees may be unintuitive for students, especially if
                                                               they have recently learned Bayes’ Rule. Again, this can be
                                                               interpreted as evidence that Pyrenees’ strong guidance did
                                                               have an impact on students’ problem solving strategies, but
                                                               it also raises concerns about how reasonable this guidance
                                                               will appear to the students. Regardless of one’s interpreta-
                                                               tion, an awareness of a trend like this can help inform the
                                                               evolution of model-based tutors like Andes and Pyrenees.

                                                               4.2   Comparing datasets
Figure 3: A graph representation of problems                   We now turn to qualitatively comparing the datasets. While
Ex252a, left, and Ex242, right. States and edges are           it is not a common practice to directly compare data from
colored on a gradient from blue to yellow, indicating          different tutors, we argue that it is appropriate, especially
the number of students who reached that state in the           in this context. In longstanding tutoring projects it is com-
Andes and Pyrenees tutors respectively. Rounded                mon for developers and researchers to make many substan-
edges indicate that at least one student from both             tive changes. The Andes system itself has undergone sub-
tutors is present in a state. A green border indicates         stantial interface changes over the course of its development
a solution state, and a pink border indicates that a           [20]. These changes can alter student behavior in substan-
state is contained in the pedagogically “ideal” solu-          tial ways, and it is important for researchers to consider
tion. Edge thickness corresponds to the natural log            how they affect not just learning outcomes but also problem
of the number of problem attempts which included               solving strategies, as was investigated by Chi et al. [4].
the given edge.
                                                               In many respects the close relationship between Andes and
                                                               Pyrenees makes them analogous to different versions of the
The problem is notable in the Pyrenees dataset because it      same tutor and the presence of an isomorphic knowledge
was the only one in which the students were split almost       base and problem set makes it possible for us to draw mean-
evenly among two solution paths. For most of the problems      ingful comparisons between students. In this section we will
in the dataset the vast majority of students followed the      inspect how the scaffolding design decisions made when con-
optimal solution path with only a few finding alternatives.    structing the tutors affected the problem solving strategies
The ideal solution path, as suggested by Pyrenees’ domain      exhibited by the students.
model, employed repeated applications of the Conditional
Probability Theorem: P (A ∩ B) = P (A|B)P (B), which the       A visual inspection of the state graphs for each problem
problem was designed to teach. The students had been pre-      revealed significant portions of each graph were shared be-
viously exposed to Bayes’ Theorem however, and over half       tween the two datasets and portions that were represented
of them chose to apply it instead. This allowed them to        in only one of the two. Despite the fact that students using
circumvent one variable definition and two applications of     Andes were capable of reaching any of the states available
the Conditional Probability Theorem, achieving a slightly      to students in Pyrenees, many Pyrenees states were never
shorter solution path. We make no argument here which          discovered by Andes students. As noted in Section 4.1, this
path the tutor should encourage students to take, but it is    suggests that guidance from the Pyrenees tutor is successful
worth noting that the Hint Factory, when trained on the        in leading students down solution paths that they would not
Pyrenees data for this problem, recommends the shorter,        otherwise have discovered, possibly applying skills that they
more popular path.                                             would otherwise not have used.

The Andes dataset gives us a very different set of insights    To quantify these findings, we calculated the relative sim-
into this problem. Because Andes lacks the strong pro-         ilarity of students in each tutor. For a given problem, we
cess scaffolding of Pyrenees, students were able to make a     defined the state-similarity between datasets A and B as the
wider variety of choices, leading to a graph with many more,   probability that a randomly selected state from a student in
less populous states. While almost every state and edge in     A will be passed through by a randomly selected student
the Pyrenees graph represents multiple students, the An-       in B. Recall from Section 3.2 that our state representation
des graph contains a number of paths, including solutions,     allows us to directly compare states across tutors. By this
that were reached by only one student. In some state-based     definition, the self-similarity of a dataset is a measure of
analyses the authors choose to omit these singleton states,    how closely its students overlap each other while the cross-
for instance when generating hints. We have chosen to in-      similarity is a measure of how closely its students overlap
  States         Andes           Pyrenees         Solution           States        Andes           Pyrenees        Solution
  Andes       0.551 (0.134)    0.494 (0.153)    0.478 (0.186)        Andes      0.419 (0.150)    0.327 (0.139)   0.253 (0.172)
 Pyrenees     0.460 (0.141)    0.688 (0.106)    0.669 (0.146)       Pyrenees    0.372 (0.193)    0.709 (0.145)   0.601 (0.214)
 Actions         Andes           Pyrenees         Solution          Actions        Andes           Pyrenees        Solution
  Andes       0.878 (0.085)    0.874 (0.118)    0.851 (0.140)        Andes      0.818 (0.151)    0.582 (0.168)   0.636 (0.205)
 Pyrenees     0.828 (0.117)    0.936 (0.021)    0.923 (0.036)       Pyrenees    0.678 (0.212)    0.899 (0.038)   0.879 (0.055)

Table 1: Pairwise similarity across tutors and the                 Table 2: Pairwise similarity across tutors and the
ideal solution path. Similarities were calculated for              ideal solution path calculated using a variable-free
each problem, and each cell lists the mean (and stan-              state representation. Rows and columns are the
dard deviation) over all problems. The top half cov-               same as in Table 1.
ers the state similarity metrics while the bottom half
of each table covers action similarity.                                        Problem        Andes      Pyrenees
                                                                                 ex132     26 (47.5%)    2 (98.7%)
                                                                                ex132a     13 (33.3%)    1 (100.0%)
with the other dataset. Similarity measures for the datasets                     ex144      2 (96.6%)    1 (100.0%)
can be found at the top of Table 1.                                              ex152      11 (0.0%)     4 (0.0%)
                                                                                ex152a      8 (59.0%)    3 (97.4%)
Predictably, both datasets have higher self-similarity than                     ex152b      12 (0.0%)    1 (100.0%)
cross-similarity, with Pyrenees showing higher self-similarity                   ex212      8 (71.4%)    1 (100.0%)
than Andes. This indicates that Pyrenees students chose                          ex242       9 (0.0%)    2 (49.38%)
more homogeneous paths to the goal. This is reasonable                           ex252      7 (76.9%)    2 (98.4%)
and consistent with the heavy scaffolding that is built into                    ex252a      4 (81.8%)    2 (98.6%)
the system. It is important to note that our similarity met-                    exc137      19 (0.0%)    2 (98.75%)
rics are not symmetric. The cross-similarity of Pyrenees
with Andes is higher than the reverse. This indicates that         Table 3: For each problem, the tables gives the num-
the path taken by Pyrenees students are more likely to have        ber of unique solution states represented in each tu-
been observed by Andes students than vice-versa. This has          tor’s dataset. Note that there may exist many solu-
important implications for designers who are interested in         tion paths which reach a given solution. The follow-
collecting data from a system that is undergoing modifica-         ing percent (in parentheses) represents the percent
tions. If a system becomes increasingly scaffolded and re-         solution paths that ended in the pedagogically ideal
strictive over time, past data will remain more relevant than      solution.
in a system that is relaxed. In many ways this simply re-
flects the intuition that allowing students to explore a state
space more fully will produce more broadly useful data, and        with the design goals of Pyrenees which was set up to guide
restricting students will produce data that is more narrowly       students along the otherwise unfamiliar path of the TVS.
useful. Note that here we are only observing trends, and we
make no claims of statistical significance.                        We also opted to examine the impact of the variable defini-
                                                                   tions on our evaluation. As noted above, variable definitions
In our analysis, we found that, within both datasets, many         are an atomic action. They do not depend upon any event
solution paths or sub-paths differed only in the order that        assertion and thus have no ordering constraints unlike the
actions were performed. In our domain, many actions do             principles. We did so with the hypothesis that this would
not have ordering constraints. It is possible, for example,        increase the similarity metrics for the datasets by eliminat-
to define the variables A and B in either order, and the re-       ing the least constrained decisions from consideration. Our
sulting solution paths would deviate from one another. We          results are shown in Table 2 below. Contrary to our ex-
thus sought to determine how much of the observed differ-          pectations, this actually reduced the similarity both within
ence between our two datasets was due to these ordering            and across the datasets, with the exception of Pyrenees’ self-
effects. To that end we define the action-similarity between       similarity. Thus the unconstrained variable definitions did
datasets A and P as the probability that a randomly se-            not substantially contribute to the dissimilarity. Rather,
lected action performed by a student in A will have been           most of the variation lay in the order of principle applica-
performed by a by a randomly selected student in B. These          tions.
values are shown in the bottom of Table 1, and each of the
trends observed for state-similarity hold, with predictably        4.3   Similarity to an “ideal" solution
higher similarity values.                                          We now turn to exploring how well the ideal solution was
                                                                   represented in the datasets. For both tutors the ideal solu-
It is notable that the similarity between Pyrenees and Andes       tion is the pedagogically-desirable path constructed via the
is almost as high as Andes’ self-similarity, indicating that the   TVS. Our measure of cross-similarity between two datasets
actions taken by Pyrenees students are almost as likely to         can also be applied between a single solution path and a
be observed in Andes students as Pyrenees students. This           dataset by treating the single solution as a set of one. We
suggests that, for the most part, the Andes students per-          can thus measure the average likelihood of an ideal solution
formed a superset of the actions performed by the Pyrenees         state appearing in a student’s solution from each dataset.
students. Thus the impact of Pyrenees is most visible in the       The results of this calculation are shown in tables 1 and 2,
order of execution, not the actions chosen. This is consistent     using both the state- and action-similarities explained ear-
lier. Predictably, the solution has a high similarity with
Pyrenees students, as these students are scaffolded tightly
and offered few chances to deviate from the path.

As Table 3 shows, Pyrenees students were funneled almost
exclusively to the ideal solution on the majority of prob-
lems, even if their paths to the solution were variable. We
found only one problem, Ex152, where the Pyrenees stu-
dents missed the ideal path. That was traced to a pro-
gramming error that forced students along a similar path.
Otherwise, there was only one problem, Ex242 (discussed
in Section 4.1), where a meaningful percentage of students
chose a different solution. The Andes students, by contrast,
were much less likely to finish in the ideal solution state, but
this was also problem-dependent.

4.4    Applications of the Hint Factory
Finally, having shown that the datasets differ, and that these
                                                                   Figure 4: The four Cold Start curves, averaged
differences are consistent with the differing design choices of
                                                                   across all problems. The x-axis shows the number of
the two tutors, we sought to determine what effect those
                                                                   students used to train the model, and the y-access
differences would have on data-driven hint generation. Our
                                                                   shows the percentage of a new student’s path that
goal was to determine how applicable a hint model of the
                                                                   has available hints. The curve labeled “XvY” indi-
type produced by the Hint Factory would be for one dataset
                                                                   cates training on the X dataset and selecting a new
if it was trained on another. To that end we performed a
                                                                   student from the Y dataset (A = Andes; P = Pyre-
modified version of the Cold Start Experiment [1], which
                                                                   nees).
is designed to measure the number of state-specific hints
that Hint Factory can provide given a randomly selected
dataset. The Cold Start experiment functions like leave-
one-out cross-validation for state-based hint generation. In       system in the same domain. Clearly a substantive interface
the original Cold Start experiment, one student was selected       and scaffolding change of the type made in Pyrenees can
at random and removed from the dataset, to represent a             change the state space sufficiently that we cannot trivially
“new” student using the tutor. Each remaining student in           rely on our prior data. On the other hand, while the cross-
the dataset was then added, one at a time, in a random             application of data does have upper limits, those limits are
order to the Hint Factory’s model. On each iteration, the          comparatively high. Clearly data from a prior system can be
model is updated and the percentage of states on the ‘new’         reused and can serve as a reliable baseline for novel system,
student’s path for which a hint is available is calculated.        with the caveat that additional exploratory data is required.
This is repeated a desired number of times with new students
to account for ordering effects.
                                                                   5.   DISCUSSION AND CONCLUSION
For the present study we calculated cold-start curves for          Our goal in this paper was to evaluate the application of
both the Pyrenees and Andes datasets. We also calculated           data-driven methods such as the Hint Factory to model-
curves using the opposing dataset to illustrate the growth         based tutoring systems. To that end we analyzed and com-
rate for cross-tutor hints. For these modified curves we se-       pared datasets collected from two closely-related tutoring
lected the hint-generating students from the opposing dataset.     systems: Andes and Pyrenees. Through our analysis we
All four curves are shown in Figure 4. Here AvA and PvP            sought to: (1) evaluate the differences between closely-related
designate the within tutor curves for Andes and Pyrenees           systems; (2) assess the impact of specific design decisions
respectively while PvA and AvP designate the cross-tutor           made in those systems for user behaviors; and (3) evalu-
curves for hints from Pyrenees provided to Andes users and         ate the potential application of data-driven hint generation
vice-versa. Figure 4 represents an average over all problems,      across systems.
and therefore the x-axis extends only as far as the minimum
number of students to complete a problem. As the curves            We found that, while the systems shared isomorphic domain
illustrate, the within-tutor curves reach high rates of cov-       models, problems, and ideal solutions, the observed user be-
erage relatively quickly with PvP reaching a plateau above         haviors differed substantially. Students using the Andes sys-
95% after 21 students and AvA reaching 85%.                        tem explored the space more widely, were more prone to
                                                                   identify novel solutions, and rarely followed the ideal solu-
The cross-tutor curves, by contrast, reach much lower lim-         tion path. Students in Pyrenees, by contrast, were far more
its. AvP reaches a plateau of over 75% coverage, while PvA         homogeneous in their solution process and were limited in
reaches a plateau of 60% coverage. This reflects the same          the alternative routes they explored. Contrary to our expec-
trends observed in Tables 1 and 2, where Andes better ex-          tations, we found that this variation was not due to simple
plains the Pyrenees data than vice-versa; however, neither         ordering variations in the simplest of steps but of alterna-
dataset completely covers the other. On the one hand this          tive strategy selection for the higher-level domain principles.
result is somewhat problematic as it indicates that prior data     This is largely consistent with the design decisions that mo-
has a limited threshold for novel tutors or novel versions of a    tivated both systems and with the results of prior studies.
We also found that the state-based hint generation method            Building Expert Systems. Addison-Wesley Publishing
used in the Hint Factory can be applied to the Andes and             Company Inc., Reading, Massachusetts, U.S.A., 1983.
Pyrenees data given a suitable state representation. For this   [10] A. Hicks, B. Peddycord III, and T. Barnes. Building
analysis we opted for a set-based representation given the           Games to Learn from Their Players: Generating Hints
absence of strong ordering constraints across the principles.        in a Serious Game. In Intelligent Tutoring Systems
We then completed a cold-start analysis to show that the             (ITS), pages 312–317, 2014.
cross-tutor data could be used to bootstrap the construction    [11] S. Katz, A. Lesgold, E. Hughes, D. Peters, G. Eggan,
of hints for a novel system but does not provide for complete        M. Gordin, and L. Greenberg. Sherlock 2: An
coverage.                                                            intelligent tutoring system built on the lrdc
                                                                     framework. In C. P. Bloom and R. B. Loftin, editors,
Ultimately we believe that the techniques used for data-             Facilitating the Development and Use of Interactive
driven hint generation have direct application to model-             Learning Environments, Computers, Cognition, and
based systems. Data-driven analysis can be used to iden-             Work, chapter 10, pages 227 – 258. Lawrence Erlbaum
tify the behavioral differences between closely related sys-         Associates, Mawah New Jersey, 1998.
tems and, we would argue, changes from one version of a         [12] R. Kumar, M. E. Roy, B. Roberts, and J. I. Makhoul.
system to another. We also found that these changes can              Toward automatically building tutor models using
be connected to the specific design decisions made during            multiple behavior demonstrations. In
development. Further, we found that data-driven methods              S. Trausan-Matu, K. E. Boyer, M. Crosby, and
can be applied to model-based tutoring data to generate              K. Panourgia, editors, Proceedings of the 12th
state-based hints. We believe that hint information of this          International Conference on Intelligent Tutoring
type may be used to supplement the existing domain models            Systems, LNCS 8474, pages 535 – 544. Springer
to produce more user-adaptive systems. In future work we             Verlag, 2014.
plan to apply these analyses to other appropriate datasets      [13] B. Peddycord III, A. Hicks, and T. Barnes.
and to test the incorporation of state-driven hints or hint          Generating Hints for Programming Problems Using
refinement to existing domain models.                                Intermediate Output. In Proceedings of the 7th
                                                                     International Conference on Educational Data Mining
6.   ACKNOWLEDGMENTS                                                 (EDM 2014), pages 92–98, 2014.
Work supported by NSF Grant #1432156 “Educational Data          [14] C. Piech, M. Sahami, J. Huang, and L. Guibas.
Mining for Individualized Instruction in STEM Learning En-           Autonomously Generating Hints by Inferring Problem
vironments” Min Chi & Tiffany Barnes Co-PIs.                         Solving Policies. In Learning at Scale (LAS), 2015.
                                                                [15] K. Rivers and K. Koedinger. Automatic generation of
7.   REFERENCES                                                      programming feedback: A data-driven approach. In
 [1] T. Barnes and J. Stamper. Toward Automatic Hint                 The First Workshop on AI-supported Education for
     Generation for Logic Proof Tutoring Using Historical            Computer Science (AIEDCS 2013), 2013.
     Student Data. In Intelligent Tutoring Systems (ITS),
                                                                [16] K. Rivers and K. Koedinger. Automating Hint
     pages 373–382, 2008.                                            Generation with Solution Space Path Construction. In
 [2] T. Barnes and J. C. Stamper. Automatic hint                     Intelligent Tutoring Systems (ITS), 2014.
     generation for logic proof tutoring using historical       [17] J. C. Stamper, M. Eagle, T. Barnes, and M. J. Croy.
     data. Educational Technology & Society, 13(1):3–12,             Experimental evaluation of automatic hint generation
     2010.
                                                                     for a logic tutor. I. J. Artificial Intelligence in
 [3] M. Chi and K. VanLehn. Eliminating the gap between              Education, 22(1-2):3–17, 2013.
     the high and low students through meta-cognitive
                                                                [18] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
     strategy instruction. In Intelligent Tutoring Systems           R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
     (ITS), volume 5091, pages 603–613, 2008.                        M. Wintersgill. The andes physics tutoring system:
 [4] M. Chi and K. VanLehn. Meta-Cognitive Strategy                  Five years of evaluations. In C. Looi, G. I. McCalla,
     Instruction in Intelligent Tutoring Systems: How,               B. Bredeweg, and J. Breuker, editors, Artificial
     When, and Why. Educational Technology & Society,                Intelligence in Education - Supporting Learning
     3(1):25–39, 2010.                                               through Intelligent and Socially Informed Technology,
 [5] A. Corbett. Cognitive computer tutors: Solving the              Proceedings of the 12th International Conference on
     two-sigma problem. In User Modeling 2001, pages                 Artificial Intelligence in Education, AIED 2005, July
     137–147, 2001.                                                  18-22, 2005, Amsterdam, The Netherlands, volume
 [6] M. Eagle and T. Barnes. Exploring Differences in                125 of Frontiers in Artificial Intelligence and
     Problem Solving with Data-Driven Approach Maps. In              Applications, pages 678–685. IOS Press, 2005.
     Educational Data Mining (EDM), 2014.                       [19] K. VanLehn, C. Lynch, K. G. Schulze, J. A. Shapiro,
 [7] M. Eagle and T. Barnes. Exploring Networks of                   R. Shelby, L. Taylor, D. Treacy, A. Weinstein, and
     Problem-Solving Interactions. In Learning Analytics             M. Wintersgill. The andes physics tutoring system:
     (LAK), 2015.                                                    Lessons learned. I. J. Artificial Intelligence in
 [8] D. Fossati, B. D. Eugenio, and S. Ohlsson. I learn              Education, 15(3):147–204, 2005.
     from you, you learn from me: How to make iList learn       [20] K. VanLehn and B. van de Sande. The Andes physics
     from students. In Artificial Intelligence in Education          tutoring system: An experiment in freedom. Advances
     (AIED), 2009.                                                   in intelligent tutoring systems., pages 421–443, 2010.
 [9] F. Hayes-Roth, D. A. Waterman, and D. B. Lenat.