<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Using the Hint Factory to Compare Model-Based Tutoring Systems</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Collin Lynch</string-name>
          <email>cflynch@ncsu.edu</email>
          <email>ynch@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Min Chi</string-name>
          <email>mchi@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Thomas W. Price</string-name>
          <email>twprice@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tiffany Barnes</string-name>
          <email>tmbarnes@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State, University</institution>
          ,
          <addr-line>890 Oval Drive, Raleigh, NC 27695</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Model-based tutoring systems are driven by an abstract domain model and solver that is used for solution validation and student guidance. Such models are robust but costly to produce and are not always adaptive to speci c students' needs. Data-driven methods such as the Hint Factory are comparatively cheaper and can be used to generate individualized hints without a complete domain model. In this paper we explore the application of data-driven hint analysis of the type used in the Hint Factory to existing modelbased systems. We present an analysis of two probability tutors Andes and Pyrenees. The former allows for exible problem-solving while the latter sca olds students' solution path. We argue that the state-space analysis can be used to better understand students' problem-solving strategies and can be used to highlight the impact of di erent design decisions. We also demonstrate the potential for data-driven hint generation across systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        Developers of model-based tutoring systems draw on
domain experts to develop ideal models for student guidance.
Studies of such systems have traditionally been focused on
their overall impact on students' performance and not on the
students' user-system interaction. The Hint Factory, by
contrast, takes a data-driven approach to extract advice based
upon students' problem solving paths. In this paper we will
apply the Hint Factory analytically to evaluate the impact
of user interface changes and solution constraints between
two closely-related tutoring systems for probability.
Model-based tutoring systems are based upon classical
expert systems, which represent relevant domain knowledge
via static rule bases or sets of constraints [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. These
knowledge bases are generally designed by domain experts or with
their active involvement. They are then paired with classical
search algorithms or heuristic satisfaction algorithms to
automatically solve domain problems, identify errors in student
solutions, and to provide pedagogical guidance. The goal of
the design process is to produce expert models that give the
same procedural advice as a human expert. Classical
modelbased tutors have been quite successful in eld trials, with
systems such as the ACT Programming Tutor helping
students achieve almost two standard deviations higher than
those receiving conventional instruction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Data-driven hint generation methods such as those used in
Hint Factory [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] take a di erent approach. Rather than
using a strong domain model to generate a-priori advice,
datadriven systems examine prior student solution attempts to
identify likely paths and common errors. This prior data
can then be used to provide guidance by directing students
towards successful paths and away from likely pitfalls. In
contrast to the expert systems approach, these models are
primarily guided not by what experts consider to be ideal
but by what students do.
      </p>
      <p>
        Model-based systems such as Andes [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] are advantageous
as they can provide appropriate procedural guidance to
students at any point in the process. Such models can also be
designed to reinforce key meta-cognitive concepts and
explicit solution strategies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. They can also scale up rapidly
to include new problems or even new domain concepts which
can be incorporated into the existing system and will be
available to all future users. Rich domain models, however,
are comparatively expensive to construct and require the
long-term involvement of domain experts to design and
evaluate them.
      </p>
      <p>
        Data-driven methods for generating feedback, by contrast,
require much lower initial investment and can readily adapt
to individual student behaviors. Systems such as the Hint
Factory are designed to extract solutions from prior student
data, to evaluate the quality of those solutions, and to
compile solution-speci c hints [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. While this avoids the need
for a strong domain model, it is limited to the space of
solutions explored by prior students. In order to incorporate
new problems or concepts it is necessary to collect additional
data. Additionally, such methods are not generally designed
to incorporate or reinforce higher-level solution strategies.
We believe that both of these approaches have inherent
advantages and are not necessarily mutually exclusive. Our
goal in this paper is to explore what potential data-driven
methods have to inform and augment model-based systems.
We argue that data-driven methods can be used to: (1)
evaluate the di erences between closely-related systems; (2)
assess the impact of speci c design decisions made in those
systems for user behaviors; and (3) evaluate the potential
application of data-driven hint generation across systems. To
that end we will survey relevant prior work on model-based
and data-driven tutoring. We will describe two
closelyrelated tutoring systems and data collected from them. We
will then present a series of analyses using state-based
methods and discuss the conclusions that we drew from them.
      </p>
    </sec>
    <sec id="sec-2">
      <title>BACKGROUND</title>
    </sec>
    <sec id="sec-3">
      <title>2.1 Model-Based Tutoring</title>
      <p>
        Model-based tutoring systems take a classical expert-systems
approach to tutoring. They are typically based upon a
strong domain model composed of declarative rules and facts
representing domain principles and problem-solving actions
coupled with an automatic problem solver. This knowledge
base is used to structure domain knowledge, de ne
individual problems, evaluate candidate solutions, and to provide
student guidance. Novices typically interact with the system
through problem solving with the system providing solution
validation, automatic feedback, pedagogical guidance, and
additional problem-solving tasks. The Sherlock 2 system, for
example, was designed to teach avionics technicians about
appropriate diagnostic procedures [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The system relies on
a domain model that represents the avionics devices being
tested, the behavior of the test equipment, and rules about
expert diagnostic methods. Sherlock 2 uses these models
to pose dynamic challenges to problem solvers, to simulate
responses to their actions, and to provide solution guidance.
Andes [
        <xref ref-type="bibr" rid="ref18 ref19 ref20">19, 18, 20</xref>
        ] and Pyrenees [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] are closely-related
modeldriven ITSs in the domains of physics and probability. They
were originally developed at the University of Pittsburgh
under the Direction of Dr. Kurt VanLehn. Like other
modelbased systems, they rely on a rule-based domain model and
automatic problem solvers that treat the domain rules as
problem-solving steps. They distinguish between
higherlevel domain concepts such as Bayes' Rule, and atomic steps
such as variable de nitions. Principles are de ned by a
central equation (e.g. p(AjB) = (p(BjA) p(A))=p(B)) and
encapsulate a set of atomic problem-solving steps such as
writing the equation and de ning the variables within it.
The systems are designed to function as homework-helpers,
with students logging into the system and being assigned or
selecting one of a set of prede ned problems. Each problem
is associated with a pre-compiled solution graph that de nes
the set of possible solutions and problem-solving steps. The
system uses a principle-driven automated problem solver to
compile these graphs and to identify the complete solution
paths. The solver is designed to implement the Target
Variable Strategy (TVS), a backward-chaining problem solving
strategy that proceeds from a goal variable (in this case
the answer to the problem) via principle applications to the
given information. The TVS was designed with the help of
domain experts and guides solvers to de ne basic solution
information (e.g. given variables) and then to proceed from
the goal variable and use principles to de ne it in terms of
the given variables.
      </p>
      <p>
        Students working with Andes use a multi-modal user
interface to write equations, de ne variables and engage in other
atomic problem-solving steps. A screenshot of the Andes UI
can be seen in Figure 1. Andes allows students to solve
problems exibly, completing steps in any order so long as they
are valid [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. A step is considered to be valid if it matches
one or more entries in the saved solution paths and all
necessary prerequisites have been completed. Invalid steps are
marked in red, but no other immediate feedback is given.
Andes does not force students to delete or x incorrect
entries as they do not a ect the solution process. In
addition to validating entries, the Andes system also uses the
precompiled solution graphs to provide procedural guidance
(next-step-help). When students request help, the system
will map their work to the saved solution paths. It will then
select the most complete solution and prompt them to work
on the next available step.
      </p>
      <p>
        One of the original goals of the Andes system was to
develop a tutor that operated as an \intelligent worksheet."
The system was designed to give students the freedom to
solve problems in any order and to apply their preferred
solution strategy. The system extends this freedom by
allowing invalid steps in an otherwise valid solution and by
allowing students to make additional correct steps that do
not advance the solution state or are drawn from multiple
solution paths. This was motivated in part by a desire to
make the system work in many di erent educational
contexts where instructors have their own preferred methods
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The designers of Andes also consciously chose only to
provide advice upon demand when the students would be
most willing to accept it. For the students however,
particularly those with poor problem-solving skills, this passive
guidance and comparative freedom can be problematic as it
does not force them to adhere to a strategy.
      </p>
      <p>
        This problem motivated the development of Pyrenees.
Pyrenees, like Andes acts as a homework helper and supports
students with on-demand procedural and remediation help. It
uses an isomorphic domain model with the same principles,
basic steps, problems, and solution paths. Unlike Andes,
however, Pyrenees forces students to applying the
targetvariable-strategy during problem solving. It also requires
them to repair incorrect entries immediately before moving
on. Students are guided through the solution process with
a menu-driven interface, shown in Figure 2. At each step,
the system asks students what they want to work on next
and permits them to make any valid step that is consistent
with the TVS. Chi and VanLehn [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] conducted a study of the
two systems and found that sca olding the TVS in Pyrenees
helped to eliminate the gap between high and low learners.
This e ect was observed both in the original domain where
it was taught (in their case probability) and it transferred to
a new domain (physics), where students used Andes alone.
      </p>
    </sec>
    <sec id="sec-4">
      <title>2.2 Data-Extraction and</title>
    </sec>
    <sec id="sec-5">
      <title>Data-Driven Tutoring.</title>
      <p>
        One of the longstanding goals of educational data-miners is
to support the development of data-driven tutoring systems.
Such systems use past student data to structure pedagogical
and domain knowledge, administer conceptual and
pedagogical advice, or evaluate student performance and needs. A
number of attempts have been made to address these goals.
One of the most successful data-driven systems is the Hint
Factory [
        <xref ref-type="bibr" rid="ref1 ref17 ref2">1, 2, 17</xref>
        ]. The Hint Factory takes an MDP-based
approach to hint generation. It takes as input a set of prior
student logs for a given problem, represented as a network of
interactions [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. Each vertex in this network represents the
state of a student's partial solution at some point during the
problem solving process, and each edge represents an action
that takes the student from one state to another. A complete
solution is represented as a path from the initial state to a
goal state. Each state in the interaction network is assigned
a weight via a value-iteration algorithm. A new student
requesting a hint is matched to a previously observed state and
given context-sensitive advice. If, for example, the student is
working on a problem that requires Bayes' Rule and has
already de ned p(A), p(B), and p(BjA) then the Hint Factory
would rst prompt them to consider de ning p(AjB), then
it would point them to Bayes Rule, before nally showing
them the equation p(AjB) = (p(BjA) p(A))=p(B).
These hints are incorporated into existing tutoring systems
in the form of a lookup table that provides state-speci c
advice. When a user asks for help the tutor will match their
current state to an index state in the lookup table and will
prompt them to take the action that will lead them to the
highest value neighboring state. If their current state is not
found then the tutor will look for a known prior state or
will give up. The Hint Factory has been applied successfully
in a number of domains including logic proofs [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], data
structures [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], and programming [
        <xref ref-type="bibr" rid="ref10 ref13 ref15">15, 10, 13</xref>
        ]. Researchers
have also explored other related methods for providing
datadriven hints. These include alternative state representations
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], path construction algorithms [
        <xref ref-type="bibr" rid="ref14 ref16">16, 14</xref>
        ], and
examplebased model-construction [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>The primary goal of the Hint Factory is to leverage prior
data to provide optimal state-speci c advice. By calculating
advice on a per-state basis, the system is able to adapt to
students' speci c needs by taking into account both their
current state and the paths that they can take to reach the
goal. As a consequence the authors of the Hint Factory
argue that this advice is more likely to be in the students'
Zone of Proximal Development and thus more responsive to
their needs than a less-sensitive algorithm.</p>
    </sec>
    <sec id="sec-6">
      <title>3. METHODS</title>
      <p>In order to investigate the application of data-driven
methods to model-based tutoring systems, we collected data from
two studies conducted with Andes and Pyrenees in the
domain of probability. We then transformed these datasets
into interaction networks, consisting of states linked with
actions. We used this representation to perform a variety of
quantitative and qualitative analyses with the goal of
evaluating the di erences between the two systems and the
impact of the speci c design decisions that were made in each.</p>
    </sec>
    <sec id="sec-7">
      <title>3.1 The Andes and Pyrenees Datasets</title>
      <p>
        The Andes dataset was drawn from an experiment
conducted at the University of Pittsburgh [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This study was
designed to assess the di erential impact of instruction in
Andes and Pyrenees on students' meta-cognitive and
problemsolving skills. Participants in this study were college
undergraduates who were required to have taken high-school level
algebra and physics but not to have taken a course in
probability or statistics. The participants were volunteers and
were paid by time not performance.
      </p>
      <p>Forty-four students completed the entire study. However
for the purposes of the present analysis, we drew on all
66 students who completed at least one problem in
AndesProbability. This is consistent with prior uses of the Hint
Factory which draw from all students including those who
did not complete the problem. The Pyrenees-Probability
logs from this study were not used due to problems with the
data format that prevented us from completing our analysis.
From this dataset we drew 394 problem attempts covering
11 problems. The average number of steps required to solve
the problems was 17.6. For each problem we analyzed
between 25 and 72 problem attempts, with an average of 35.8
attempts per problem. Some attempts were from the same
student, with at most two successful attempts per student.
Over all problems, 81.7% of the attempts were successful,
with the remainder being incomplete attempts.</p>
      <p>The Pyrenees dataset was drawn from a study of 137
students conducted in the 200-level Discrete Mathematics course
in the Department of Computer Science at North Carolina
State University. This study used the same probability
textbook and pre-training materials as those used in the Andes
study. The students used Pyrenees as part of a homework
assignment, in which they completed 12 problems using the
tutoring system. One of these problems was not represented
in the Andes dataset. We therefore excluded it from our
analysis, leaving 11 shared problems.</p>
      <p>Unlike the Andes students, however, the Pyrenees students
were not always required to solve every problem. In this
study the system was con gured to randomly select some
problems or problem steps to present as worked examples
rather than as steps to be completed. In order to ensure
that the results were equivalent we excluded the
problemlevel worked examples and any attempt with a step-level
worked example from our analysis. As a consequence, each
problem included a di erent subset of these students. For
each problem we analyzed between 83 and 102 problem
attempts, with an average of 90.8 attempts per problem. Some
attempts were from the same student, with at most one
successful attempt per student. Over all problems, 83.4% of
the attempts were successful.</p>
    </sec>
    <sec id="sec-8">
      <title>3.2 State and Action Representations</title>
      <p>
        In order to compare the data from both tutors, we
represented each problem as an interaction network, a
representation used originally in the Hint Factory [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In the
network a vertex, or state, represents the sum total of a
students' current problem solving steps at a given time during
a problem-solving attempt. Because Andes permits exible
step ordering while Pyrenees does not, we chose to represent
the problem solving state st as the set of valid variables and
equations de ned by the student at time t.
      </p>
      <p>A variable is a probabilistic expression, such as P (A [ B),
that the student has identi ed as important to solving the
problem, for which the probability is known or sought. An
equation represents the application of a principle of
probability, which relates the values of de ned variables, such as
the Complement Theorem, P (A) + P (:A) = 1. Because
such equations can be written in many algebraically
equivalent ways, we represent each equation as a 2-tuple,
consisting of the set of variables included in the equation (e.g.
fP (A); P (:A)g) and the principle being applied (e.g.
Complement Theorem). Because we only represent valid
equations, this representation uniquely identi es any equation
for a given problem. Because we used the same state
representation for both tutors, we were able to compare states
directly across tutors.</p>
      <p>Additionally, we opted to ignore incorrect entries. Pyrenees
prevents students from applying the principles of
probability improperly and forces them to correct any mistakes made
immediately therefore any errors in the student logs are
immediately removed making the paths uninformative. Andes,
by contrast, gives students free reign when writing equations
and making other entries. This freedom resulted in
syntactic errors and improper rule application errors arising in our
dataset. The meaning of these invalid equations is inherently
ambiguous and therefore di cult to incorporate into a state
de nition. However such errors are immediately agged by
the system and may be ignored by the student without
consequence as they do not a ect the answer validity therefore
they may be safely ignored as well.</p>
      <p>An edge, or action, in our network represents the correct
application of a rule or a correct variable de nition and leads
to a transition from one state to another. For the present
dataset and state representation, the possible actions were
the de nition or deletion of variables or equations. Each of
these actions was possible in both tutors.</p>
    </sec>
    <sec id="sec-9">
      <title>4. ANALYSIS</title>
      <p>In order to develop a broader understanding of our datasets,
we rst visualized the interaction network for each problem
as a weighted, directed graph. We included attempts from
both Andes and Pyrenees in the network, and weighted the
edges and verticies by the frequency with which it appeared
in the logs. We annotated each state and edge with the
weight contributed by each tutor. Two examples of these
graphs are given in Figure 3.</p>
      <p>Throughout this section, we will use these graphs to address
the points we outlined at the end of Section 1. We begin
with a case study from one problem and will explore the
student problem solving strategies using our graph
representation. We will then compare the Andes and Pyrenees
Systems with a variety of metrics based on this
representation. We will relate our observations back to the design
decisions of each system and identify evidence that may
support or question these decisions. Finally, we will show how
the analysis methods associated with data-driven hint
generation can be used to validate some of these ndings.</p>
    </sec>
    <sec id="sec-10">
      <title>4.1 Case Study: Problem Ex242</title>
      <p>The graphical representation of a problem is very helpful for
giving a high-level overview of a problem and performing
qualitative analysis. Problem Ex242, shown in Figure 3,
presents an interesting scenario for a number of reasons. The
problem was the 10th in a series of 12 practice problems, and
asked the following:</p>
      <p>Events A, B and C are mutually exclusive and
exhaustive events with p(A) = 0:2 and p(B) =
0:3. For an event D, we know p(DjA) = 0:04,
p(DjB) = 0:03, and p(CjD) = 0:3. Determine
p(BjD).</p>
      <p>The problem is notable in the Pyrenees dataset because it
was the only one in which the students were split almost
evenly among two solution paths. For most of the problems
in the dataset the vast majority of students followed the
optimal solution path with only a few nding alternatives.
The ideal solution path, as suggested by Pyrenees' domain
model, employed repeated applications of the Conditional
Probability Theorem: P (A \ B) = P (AjB)P (B), which the
problem was designed to teach. The students had been
previously exposed to Bayes' Theorem however, and over half
of them chose to apply it instead. This allowed them to
circumvent one variable de nition and two applications of
the Conditional Probability Theorem, achieving a slightly
shorter solution path. We make no argument here which
path the tutor should encourage students to take, but it is
worth noting that the Hint Factory, when trained on the
Pyrenees data for this problem, recommends the shorter,
more popular path.</p>
      <p>The Andes dataset gives us a very di erent set of insights
into this problem. Because Andes lacks the strong
process sca olding of Pyrenees, students were able to make a
wider variety of choices, leading to a graph with many more,
less populous states. While almost every state and edge in
the Pyrenees graph represents multiple students, the
Andes graph contains a number of paths, including solutions,
that were reached by only one student. In some state-based
analyses the authors choose to omit these singleton states,
for instance when generating hints. We have chosen to
include them as they represent a fair proportion of the Andes
data. For instance, 62 of the 126 Andes states for Ex242
were singleton states.</p>
      <p>Interestingly, while there were small variations among their
solutions, all of the Andes students choose to apply Bayes'
Rule rather than relying solely on the Conditional
Probability Theorem as suggested by Pyrenees. This, coupled with
the strong proportion of Pyrenees students who also chose
the Bayes' Rule solution, indicates that the solution o ered
by Pyrenees may be unintuitive for students, especially if
they have recently learned Bayes' Rule. Again, this can be
interpreted as evidence that Pyrenees' strong guidance did
have an impact on students' problem solving strategies, but
it also raises concerns about how reasonable this guidance
will appear to the students. Regardless of one's
interpretation, an awareness of a trend like this can help inform the
evolution of model-based tutors like Andes and Pyrenees.</p>
    </sec>
    <sec id="sec-11">
      <title>4.2 Comparing datasets</title>
      <p>
        We now turn to qualitatively comparing the datasets. While
it is not a common practice to directly compare data from
di erent tutors, we argue that it is appropriate, especially
in this context. In longstanding tutoring projects it is
common for developers and researchers to make many
substantive changes. The Andes system itself has undergone
substantial interface changes over the course of its development
[
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. These changes can alter student behavior in
substantial ways, and it is important for researchers to consider
how they a ect not just learning outcomes but also problem
solving strategies, as was investigated by Chi et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
In many respects the close relationship between Andes and
Pyrenees makes them analogous to di erent versions of the
same tutor and the presence of an isomorphic knowledge
base and problem set makes it possible for us to draw
meaningful comparisons between students. In this section we will
inspect how the sca olding design decisions made when
constructing the tutors a ected the problem solving strategies
exhibited by the students.
      </p>
      <p>A visual inspection of the state graphs for each problem
revealed signi cant portions of each graph were shared
between the two datasets and portions that were represented
in only one of the two. Despite the fact that students using
Andes were capable of reaching any of the states available
to students in Pyrenees, many Pyrenees states were never
discovered by Andes students. As noted in Section 4.1, this
suggests that guidance from the Pyrenees tutor is successful
in leading students down solution paths that they would not
otherwise have discovered, possibly applying skills that they
would otherwise not have used.</p>
      <p>To quantify these ndings, we calculated the relative
similarity of students in each tutor. For a given problem, we
de ned the state-similarity between datasets A and B as the
probability that a randomly selected state from a student in
A will be passed through by a randomly selected student
in B. Recall from Section 3.2 that our state representation
allows us to directly compare states across tutors. By this
de nition, the self-similarity of a dataset is a measure of
how closely its students overlap each other while the
crosssimilarity is a measure of how closely its students overlap
with the other dataset. Similarity measures for the datasets
can be found at the top of Table 1.</p>
      <p>Predictably, both datasets have higher self-similarity than
cross-similarity, with Pyrenees showing higher self-similarity
than Andes. This indicates that Pyrenees students chose
more homogeneous paths to the goal. This is reasonable
and consistent with the heavy sca olding that is built into
the system. It is important to note that our similarity
metrics are not symmetric. The cross-similarity of Pyrenees
with Andes is higher than the reverse. This indicates that
the path taken by Pyrenees students are more likely to have
been observed by Andes students than vice-versa. This has
important implications for designers who are interested in
collecting data from a system that is undergoing modi
cations. If a system becomes increasingly sca olded and
restrictive over time, past data will remain more relevant than
in a system that is relaxed. In many ways this simply
reects the intuition that allowing students to explore a state
space more fully will produce more broadly useful data, and
restricting students will produce data that is more narrowly
useful. Note that here we are only observing trends, and we
make no claims of statistical signi cance.</p>
      <p>In our analysis, we found that, within both datasets, many
solution paths or sub-paths di ered only in the order that
actions were performed. In our domain, many actions do
not have ordering constraints. It is possible, for example,
to de ne the variables A and B in either order, and the
resulting solution paths would deviate from one another. We
thus sought to determine how much of the observed di
erence between our two datasets was due to these ordering
e ects. To that end we de ne the action-similarity between
datasets A and P as the probability that a randomly
selected action performed by a student in A will have been
performed by a by a randomly selected student in B. These
values are shown in the bottom of Table 1, and each of the
trends observed for state-similarity hold, with predictably
higher similarity values.</p>
      <p>It is notable that the similarity between Pyrenees and Andes
is almost as high as Andes' self-similarity, indicating that the
actions taken by Pyrenees students are almost as likely to
be observed in Andes students as Pyrenees students. This
suggests that, for the most part, the Andes students
performed a superset of the actions performed by the Pyrenees
students. Thus the impact of Pyrenees is most visible in the
order of execution, not the actions chosen. This is consistent
with the design goals of Pyrenees which was set up to guide
students along the otherwise unfamiliar path of the TVS.
We also opted to examine the impact of the variable de
nitions on our evaluation. As noted above, variable de nitions
are an atomic action. They do not depend upon any event
assertion and thus have no ordering constraints unlike the
principles. We did so with the hypothesis that this would
increase the similarity metrics for the datasets by
eliminating the least constrained decisions from consideration. Our
results are shown in Table 2 below. Contrary to our
expectations, this actually reduced the similarity both within
and across the datasets, with the exception of Pyrenees'
selfsimilarity. Thus the unconstrained variable de nitions did
not substantially contribute to the dissimilarity. Rather,
most of the variation lay in the order of principle
applications.</p>
    </sec>
    <sec id="sec-12">
      <title>4.3 Similarity to an “ideal" solution</title>
      <p>We now turn to exploring how well the ideal solution was
represented in the datasets. For both tutors the ideal
solution is the pedagogically-desirable path constructed via the
TVS. Our measure of cross-similarity between two datasets
can also be applied between a single solution path and a
dataset by treating the single solution as a set of one. We
can thus measure the average likelihood of an ideal solution
state appearing in a student's solution from each dataset.
The results of this calculation are shown in tables 1 and 2,
using both the state- and action-similarities explained
earlier. Predictably, the solution has a high similarity with
Pyrenees students, as these students are sca olded tightly
and o ered few chances to deviate from the path.
As Table 3 shows, Pyrenees students were funneled almost
exclusively to the ideal solution on the majority of
problems, even if their paths to the solution were variable. We
found only one problem, Ex152, where the Pyrenees
students missed the ideal path. That was traced to a
programming error that forced students along a similar path.
Otherwise, there was only one problem, Ex242 (discussed
in Section 4.1), where a meaningful percentage of students
chose a di erent solution. The Andes students, by contrast,
were much less likely to nish in the ideal solution state, but
this was also problem-dependent.</p>
    </sec>
    <sec id="sec-13">
      <title>4.4 Applications of the Hint Factory</title>
      <p>
        Finally, having shown that the datasets di er, and that these
di erences are consistent with the di ering design choices of
the two tutors, we sought to determine what e ect those
di erences would have on data-driven hint generation. Our
goal was to determine how applicable a hint model of the
type produced by the Hint Factory would be for one dataset
if it was trained on another. To that end we performed a
modi ed version of the Cold Start Experiment [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], which
is designed to measure the number of state-speci c hints
that Hint Factory can provide given a randomly selected
dataset. The Cold Start experiment functions like
leaveone-out cross-validation for state-based hint generation. In
the original Cold Start experiment, one student was selected
at random and removed from the dataset, to represent a
\new" student using the tutor. Each remaining student in
the dataset was then added, one at a time, in a random
order to the Hint Factory's model. On each iteration, the
model is updated and the percentage of states on the `new'
student's path for which a hint is available is calculated.
This is repeated a desired number of times with new students
to account for ordering e ects.
      </p>
      <p>For the present study we calculated cold-start curves for
both the Pyrenees and Andes datasets. We also calculated
curves using the opposing dataset to illustrate the growth
rate for cross-tutor hints. For these modi ed curves we
selected the hint-generating students from the opposing dataset.
All four curves are shown in Figure 4. Here AvA and PvP
designate the within tutor curves for Andes and Pyrenees
respectively while PvA and AvP designate the cross-tutor
curves for hints from Pyrenees provided to Andes users and
vice-versa. Figure 4 represents an average over all problems,
and therefore the x-axis extends only as far as the minimum
number of students to complete a problem. As the curves
illustrate, the within-tutor curves reach high rates of
coverage relatively quickly with PvP reaching a plateau above
95% after 21 students and AvA reaching 85%.</p>
      <p>The cross-tutor curves, by contrast, reach much lower
limits. AvP reaches a plateau of over 75% coverage, while PvA
reaches a plateau of 60% coverage. This re ects the same
trends observed in Tables 1 and 2, where Andes better
explains the Pyrenees data than vice-versa; however, neither
dataset completely covers the other. On the one hand this
result is somewhat problematic as it indicates that prior data
has a limited threshold for novel tutors or novel versions of a
system in the same domain. Clearly a substantive interface
and sca olding change of the type made in Pyrenees can
change the state space su ciently that we cannot trivially
rely on our prior data. On the other hand, while the
crossapplication of data does have upper limits, those limits are
comparatively high. Clearly data from a prior system can be
reused and can serve as a reliable baseline for novel system,
with the caveat that additional exploratory data is required.</p>
    </sec>
    <sec id="sec-14">
      <title>5. DISCUSSION AND CONCLUSION</title>
      <p>Our goal in this paper was to evaluate the application of
data-driven methods such as the Hint Factory to
modelbased tutoring systems. To that end we analyzed and
compared datasets collected from two closely-related tutoring
systems: Andes and Pyrenees. Through our analysis we
sought to: (1) evaluate the di erences between closely-related
systems; (2) assess the impact of speci c design decisions
made in those systems for user behaviors; and (3)
evaluate the potential application of data-driven hint generation
across systems.</p>
      <p>We found that, while the systems shared isomorphic domain
models, problems, and ideal solutions, the observed user
behaviors di ered substantially. Students using the Andes
system explored the space more widely, were more prone to
identify novel solutions, and rarely followed the ideal
solution path. Students in Pyrenees, by contrast, were far more
homogeneous in their solution process and were limited in
the alternative routes they explored. Contrary to our
expectations, we found that this variation was not due to simple
ordering variations in the simplest of steps but of
alternative strategy selection for the higher-level domain principles.
This is largely consistent with the design decisions that
motivated both systems and with the results of prior studies.
We also found that the state-based hint generation method
used in the Hint Factory can be applied to the Andes and
Pyrenees data given a suitable state representation. For this
analysis we opted for a set-based representation given the
absence of strong ordering constraints across the principles.
We then completed a cold-start analysis to show that the
cross-tutor data could be used to bootstrap the construction
of hints for a novel system but does not provide for complete
coverage.</p>
      <p>Ultimately we believe that the techniques used for
datadriven hint generation have direct application to
modelbased systems. Data-driven analysis can be used to
identify the behavioral di erences between closely related
systems and, we would argue, changes from one version of a
system to another. We also found that these changes can
be connected to the speci c design decisions made during
development. Further, we found that data-driven methods
can be applied to model-based tutoring data to generate
state-based hints. We believe that hint information of this
type may be used to supplement the existing domain models
to produce more user-adaptive systems. In future work we
plan to apply these analyses to other appropriate datasets
and to test the incorporation of state-driven hints or hint
re nement to existing domain models.</p>
    </sec>
    <sec id="sec-15">
      <title>6. ACKNOWLEDGMENTS</title>
      <p>Work supported by NSF Grant #1432156 \Educational Data
Mining for Individualized Instruction in STEM Learning
Environments" Min Chi &amp; Ti any Barnes Co-PIs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Stamper</surname>
          </string-name>
          .
          <article-title>Toward Automatic Hint Generation for Logic Proof Tutoring Using Historical Student Data</article-title>
          .
          <source>In Intelligent Tutoring Systems (ITS)</source>
          , pages
          <fpage>373</fpage>
          {
          <fpage>382</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          and
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Stamper</surname>
          </string-name>
          .
          <article-title>Automatic hint generation for logic proof tutoring using historical data</article-title>
          .
          <source>Educational Technology &amp; Society</source>
          ,
          <volume>13</volume>
          (
          <issue>1</issue>
          ):3{
          <fpage>12</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chi</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>VanLehn.</surname>
          </string-name>
          <article-title>Eliminating the gap between the high and low students through meta-cognitive strategy instruction</article-title>
          .
          <source>In Intelligent Tutoring Systems (ITS)</source>
          , volume
          <volume>5091</volume>
          , pages
          <fpage>603</fpage>
          {
          <fpage>613</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Chi</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>VanLehn.</surname>
          </string-name>
          Meta-Cognitive
          <source>Strategy Instruction in Intelligent Tutoring Systems: How</source>
          , When, and
          <string-name>
            <surname>Why</surname>
          </string-name>
          .
          <source>Educational Technology &amp; Society</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>25</volume>
          {
          <fpage>39</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Corbett</surname>
          </string-name>
          .
          <article-title>Cognitive computer tutors: Solving the two-sigma problem</article-title>
          .
          <source>In User Modeling</source>
          <year>2001</year>
          , pages
          <fpage>137</fpage>
          {
          <fpage>147</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eagle</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Exploring Di erences in Problem Solving with Data-Driven Approach Maps</article-title>
          .
          <source>In Educational Data Mining (EDM)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Eagle</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Exploring Networks of Problem-Solving Interactions</article-title>
          .
          <source>In Learning Analytics (LAK)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Fossati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Eugenio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ohlsson</surname>
          </string-name>
          .
          <article-title>I learn from you, you learn from me: How to make iList learn from students</article-title>
          .
          <source>In Arti cial Intelligence in Education (AIED)</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Hayes-Roth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Waterman</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D. B.</given-names>
            <surname>Lenat</surname>
          </string-name>
          .
          <source>Building Expert Systems</source>
          . Addison-Wesley Publishing Company Inc., Reading, Massachusetts, U.S.A.,
          <year>1983</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hicks</surname>
          </string-name>
          , B.
          <string-name>
            <surname>Peddycord</surname>
            <given-names>III</given-names>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Building Games to Learn from Their Players: Generating Hints in a Serious Game</article-title>
          .
          <source>In Intelligent Tutoring Systems (ITS)</source>
          , pages
          <fpage>312</fpage>
          {
          <fpage>317</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Katz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lesgold</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Hughes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Peters</surname>
          </string-name>
          , G. Eggan,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gordin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Greenberg</surname>
          </string-name>
          .
          <article-title>Sherlock 2: An intelligent tutoring system built on the lrdc framework</article-title>
          . In C. P. Bloom and R. B. Loftin, editors,
          <source>Facilitating the Development</source>
          and
          <article-title>Use of Interactive Learning Environments</article-title>
          , Computers, Cognition, and Work, chapter
          <volume>10</volume>
          , pages
          <fpage>227</fpage>
          {
          <fpage>258</fpage>
          . Lawrence Erlbaum Associates, Mawah New Jersey,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Roy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Roberts</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. I.</given-names>
            <surname>Makhoul</surname>
          </string-name>
          .
          <article-title>Toward automatically building tutor models using multiple behavior demonstrations</article-title>
          . In S. Trausan-Matu,
          <string-name>
            <given-names>K. E.</given-names>
            <surname>Boyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Crosby</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          K. Panourgia, editors,
          <source>Proceedings of the 12th International Conference on Intelligent Tutoring Systems, LNCS 8474</source>
          , pages
          <fpage>535</fpage>
          {
          <fpage>544</fpage>
          . Springer Verlag,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Peddycord</surname>
          </string-name>
          <string-name>
            <given-names>III</given-names>
            ,
            <surname>A. Hicks</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          .
          <article-title>Generating Hints for Programming Problems Using Intermediate Output</article-title>
          .
          <source>In Proceedings of the 7th International Conference on Educational Data Mining (EDM</source>
          <year>2014</year>
          ), pages
          <fpage>92</fpage>
          {
          <fpage>98</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>C.</given-names>
            <surname>Piech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sahami</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Guibas</surname>
          </string-name>
          .
          <article-title>Autonomously Generating Hints by Inferring Problem Solving Policies</article-title>
          .
          <source>In Learning at Scale (LAS)</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rivers</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Koedinger</surname>
          </string-name>
          .
          <article-title>Automatic generation of programming feedback: A data-driven approach</article-title>
          . In The First Workshop on
          <article-title>AI-supported Education for Computer Science</article-title>
          (AIEDCS
          <year>2013</year>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>K.</given-names>
            <surname>Rivers</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Koedinger</surname>
          </string-name>
          .
          <article-title>Automating Hint Generation with Solution Space Path Construction</article-title>
          .
          <source>In Intelligent Tutoring Systems (ITS)</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Stamper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Eagle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Barnes</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Croy</surname>
          </string-name>
          .
          <article-title>Experimental evaluation of automatic hint generation for a logic tutor</article-title>
          .
          <source>I. J. Arti cial Intelligence in Education</source>
          ,
          <volume>22</volume>
          (
          <issue>1-2</issue>
          ):3{
          <fpage>17</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <surname>K. VanLehn</surname>
            , C. Lynch,
            <given-names>K. G.</given-names>
          </string-name>
          <string-name>
            <surname>Schulze</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Shelby</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
            , D. Treacy,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Weinstein</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wintersgill</surname>
          </string-name>
          .
          <article-title>The andes physics tutoring system: Five years of evaluations</article-title>
          . In C. Looi,
          <string-name>
            <surname>G. I. McCalla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Bredeweg</surname>
          </string-name>
          , and J. Breuker, editors,
          <source>Arti cial Intelligence in Education - Supporting Learning through Intelligent and Socially Informed Technology, Proceedings of the 12th International Conference on Arti cial Intelligence in Education, AIED 2005, July 18-22</source>
          ,
          <year>2005</year>
          , Amsterdam, The Netherlands, volume
          <volume>125</volume>
          of Frontiers in
          <source>Arti cial Intelligence and Applications</source>
          , pages
          <volume>678</volume>
          {
          <fpage>685</fpage>
          . IOS Press,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>K. VanLehn</surname>
            , C. Lynch,
            <given-names>K. G.</given-names>
          </string-name>
          <string-name>
            <surname>Schulze</surname>
            ,
            <given-names>J. A.</given-names>
          </string-name>
          <string-name>
            <surname>Shapiro</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Shelby</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Taylor</surname>
            , D. Treacy,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Weinstein</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Wintersgill</surname>
          </string-name>
          .
          <source>The andes physics tutoring system: Lessons learned. I. J. Arti cial Intelligence in Education</source>
          ,
          <volume>15</volume>
          (
          <issue>3</issue>
          ):
          <volume>147</volume>
          {
          <fpage>204</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>K.</given-names>
            <surname>VanLehn and B. van de Sande</surname>
          </string-name>
          .
          <article-title>The Andes physics tutoring system: An experiment in freedom. Advances in intelligent tutoring systems</article-title>
          ., pages
          <volume>421</volume>
          {
          <fpage>443</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>