=Paper=
{{Paper
|id=Vol-1446/GEDM_2015_Submission_10
|storemode=property
|title=BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game
|pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_10.pdf
|volume=Vol-1446
|dblpUrl=https://dblp.org/rec/conf/edm/HicksDZCB15
}}
==BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game==
<pdf width="1500px">https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_10.pdf</pdf>
<pre>
BOTS: Selecting Next-Steps from Player Traces in a Puzzle
                         Game

                   Drew Hicks                           Yihuan Dong                            Rui Zhi
               North Carolina State                  North Carolina State                North Carolina State
                    University                            University                          University
                 911 Oval Drive                        911 Oval Drive                      911 Oval Drive
               Raleigh, NC 27606                     Raleigh, NC 27606                   Raleigh, NC 27606
             aghicks3@ncsu.edu       ydong2@ncsu.edu           rzhi@ncsu.edu
                          Veronica Cateté         Tiffany Barnes
                                  North Carolina State                 North Carolina State
                                       University                           University
                                    911 Oval Drive                       911 Oval Drive
                                  Raleigh, NC 27606                    Raleigh, NC 27606
                                vmcatete@ncsu.edu                  tmbarnes@ncsu.edu

ABSTRACT                                                          model student behavior in tutors, and provide insight into
In the field of Intelligent Tutoring Systems, data-driven meth-   problem-solving strategies and misconceptions. This data
ods for providing hints and feedback are becoming increas-        structure can be used to provide hints, by treating the Inter-
ingly popular. One such method, Hint Factory, builds an           action Network similarly to a Markov Decision Process and
interaction network out of observed player traces. This data      selecting the most appropriate next step from the requesting
structure is used to select the most appropriate next step        user’s current state. This method has successfully been em-
from any previously observed state, which can then be used        ployed in systems in which each action a player may take is
to provide guidance to future players. However, this method       of similar cost; for example in the Deep Thought logic tutor
has previously been employed in systems in which each ac-         each action is an application of a particular axiom. Applying
tion a player may take requires roughly similar effort; that      this method to an environment where actions are of differ-
is, the “step cost” is constant no matter what action is taken.   ent costs, or outcomes are of varying value will require some
We hope to apply similar methods to an interaction network        adaptations to be made. In this work, we discuss how we
built from player traces in our game, BOTS; However, each         will apply Hint Factory methods to an interaction network
edge can represent a varied amount of effort on the part of       built from player traces in a puzzle game, BOTS. In BOTS,
the student. Therefore, a different hint selection policy may     each “Action” is the set of changes made to the program
be needed. In this paper, we discuss the problems with our        between each run. Therefore, using the current hint selec-
current hint policy, assuming all edges are the same cost.        tion policy would result in very high-level hints comprising
Then, we discuss potential alternative hint selection policies    a great number of changes to the student’s program. Since
we have considered.                                               this is undesirable, a different hint selection policy may be
                                                                  needed.
Keywords
Hint Generation, Serious Games, Data Mining                       2.    DATA-DRIVEN HINTS AND FEEDBACK
                                                                  In the ITS community, several methods have been proposed
1.   INTRODUCTION                                                 for generating hints/feedback from previous observations of
Data-driven methods for providing hints and feedback are          users’ solutions or behavior. Rivers et al propose a data-
becoming increasingly popular, and are especially useful for      driven method to generate hints automatically for novice
environments with user- or procedurally-generated content.        programmers based on Hint Factory[8]. They present a
One such method, Hint Factory, builds an interaction net-         domain-independent algorithm, which automates hint gener-
work out of observed player traces. An Interaction Network        ation. Their method relies on solution space, which utilizes
is a complex network of student-tutor interactions, used to       graph to represent the solution states. In solution space,
                                                                  each node represents a candidate solution and each edge rep-
                                                                  resents the action used to transfer from one state to another.
                                                                  Due to the existence of multiple ways to solve a program-
                                                                  ming problem, the size of the solution space is huge and thus
                                                                  it is impractical to use. A Canonicalizing model is used to
                                                                  reduce the size of the solution space. All states are trans-
                                                                  formed to canonicalized abstract syntax trees (ASTs). If the
                                                                  canonical form of two different states are identical, they can
                                                                  be combined together. After simplifying the solution space,
                                                                  hint generation is implemented. If the current state is in-
correct and not in the solution space, the path construction
algorithm will find an optimal goal state in the solution space
which is closest to current state. This algorithm uses change
vectors to denote the change between current state and goal
state. Once a better goal state is found during enumerat-
ing all possible changes, it returns the current combination
of change vectors. Each change vector can be applied to
current state and then form an intermediate state. The in-
termediate states are measured by desirability score, which
represents the value of the state. And then the path con-
struction algorithm generates optimal next states based on
the rank of the desirability scores of all the intermediate
states. Thus a new path can be formed and added to the
solution space, and appropriate hints can be generated.

Jin et al propose linkage graph to generate hints for pro-
gramming courses[4]. Linkage graph uses nodes to represent
program statements and direct edges to indicate ordered de-
                                                                  Figure 1: The BOTS interface. The robot’s program
pendencies of those statements. Jin’s approach applies ma-
                                                                  is along the left side of the screen. The “toolbox” of
trix to store linkage graph for computation. To generate
                                                                  available commands is along the top of the screen.
linkage matrix, first, they normalize variables in programs
by using instructor-provided variable specification file. After
variable normalization, they sort the statement with 3 steps:
(i) preprocessing, which breaks a single declaration for mul-     information and check if hints are available for the state.
tiple variables (e.g. int a, b, c) into multiple declaration      The action that leads to subsequent state with the highest
statements (e.g. int a; int b; int c;); (ii) creating statement   value is used to generate a hint sequence. A hint sequence
sets according to variable dependencies, which put indepen-       consists of four types of hints and are ordered from gen-
dent statements into first set, put statements depend only        eral hint to detailed hint. Hint provider will then show hint
on statements in the first set into second set, put statements    from top of the sequence to the student. Hint Factory has
depends only on statements in the first and second set into       been applied in logic tutors which helps students learn logic
third set, and so on; (iii) in-set statement sorting, during      proof. The result shows that the hint-generating function
which the statements are sorted in decreasing order within        could provide hints over 80% of the time.
set using their variable signatures. In hint generation, they
first generate linkage graphs with a set of correct solutions,    3.   BOTS
as the sources for hint generation. They also compose the
                                                                  BOTS is a programming puzzle game designed to teach
intermediate steps during program development into a large
                                                                  fundamental ideas of programming and problem-solving to
linkage graph, and assign a reward value to each state and
                                                                  novice computer users. The goal of the BOTS project is
the correct solution. Then, they apply value iteration to
                                                                  to investigate how to best use community-authored content
create a Markov Decision Process (MDP). When a student
                                                                  within serious games and educational games. BOTS was
requires hint, tutor will generate a linkage graph for the par-
                                                                  inspired by games like LightBot [10] and RoboRally [2], as
tial program and try to find the closest match in MDP. If
                                                                  well as the success of Scratch and it’s online community [1]
a match is found in MDP, the tutor would generate hint
                                                                  [5]. In BOTS, players take on the role of programmers writ-
with the next best state based on highest assigned value.
                                                                  ing code to navigate a simple robot around a grid-based 3D
If a match is not found in current MDP, which means the
                                                                  environment, as seen in Figure 1. The goal of each puzzle
student is taking a different approach from existing correct
                                                                  is to press several switches within the environment, which
solutions, the tutor will try to modify those correct solutions
                                                                  can be done by placing an object or the robot on them.
to fit student’s program and then provide hints.
                                                                  To program the robots, players will use simple graphical
                                                                  pseudo-code, allowing them to move the robot, repeat sets
Hint Factory[9] is an automatic hint generation technique
                                                                  of commands using “for” or “while” loops, and re-use chunks
which uses Markov decision processes (MDPs) to generate
                                                                  of code using functions. Within each puzzle, players’ scores
contextualized hints from past student data. It mainly con-
                                                                  depend on the number of commands used, with lower scores
sists of two parts - Markov Decision Process (MDP) gener-
                                                                  being preferable. In addition, each puzzle limits the maxi-
ator and hint provider. The MDP generator runs a process
                                                                  mum number of commands, as well as the number of times
to generate MDP values for all states seen in previous stu-
                                                                  each command can be used. For example, in the tutorial
dents’ solutions. In this process, all the students’ solutions
                                                                  levels, a user may only use the “Move Forward” instruction
are combined together to form a single graph. Each node
                                                                  10 times. Therefore, if a player wants to make the robot
of the graph represents a state, and each edge represents an
                                                                  walk down a long hallway, it will be more efficient to use a
action one student takes to transform from current state to
                                                                  loop to repeat a single “Move Forward” instruction, rather
another state. Once the graph is built, the MDP generator
                                                                  than to simply use several “Move Forward” instructions one
uses Bellman backup to assign values for all nodes. After up-
                                                                  after the other. These constraints are meant to encourage
dating all values, a hint file is generated. The hint provider
                                                                  players to re-use code and optimize their solutions.
uses hint file to provide hint. When a student asks for a hint
at a existing state, hint provider will retrieve current state
                                                                  In addition to the guided tutorial mode, BOTS also con-
tains an extensive “Free Play” mode, with a wide selection
of puzzles created by other players. The game, in line with                      a                               b
the “Flow of Inspiration” principles outlined by Alexander
Repenning [7], provides multiple ways for players to share
knowledge through authoring and modifying content. Play-
ers are able to create their own puzzles to share with their
peers, and can play and evaluate friends’ puzzles, improv-
ing on past solutions. Features such as peer-authored hints                      c                               d
for difficult puzzles, and a collaborative filtering approach
to rating are planned next steps for the game’s online ele-
ment. We hope to create an environment where players can                                  Error
continually challenge their peers to find consistently better
solutions for increasingly difficult problems.                                   e                               f
User-generated content supports replayability and a sense
of a community for a serious game. We believe that user-
created puzzles could improve interest, encouraging students
to return to the game to solidify their mastery of old skills
and potentially helping them pick up new ones.                     Figure 2: Several generated hints for a simple puz-
                                                                   zle. The blue icon represents the robot. The ’X’ icon
4. ANALYSIS                                                        represents a goal. Shaded boxes are boxes placed on
                                                                   goals, while unshaded boxes are not on goals. Hint
4.1 Dataset                                                        F in this figure depicts the start and most common
Data for the BOTS studies has come from a middle school            goal state of the puzzle.
computer science enrichment program called SPARCS. In
this program, the students attend class on Saturday for 4
hours where computer science undergraduates teach them             (start/intermediate) → (intermediate/goal) These transitions
about computational thinking and programming. Students             occur when a student moves the robot to an intermediate
attend a total of 7 sessions, each on a different topic, ranging   state rather than directly to the goal. Since players run
from security and encryption to game design. The students          their programs to produce output, we speculate that these
all attend the same magnet middle school. The demograph-           may represent subgoals such asmoving a box onto a specific
ics for this club are 74.2% male, 25.8% female, 36.7% African      switch. After accomplishing that task, the user then ap-
American, and 23.3% Hispanic. The student’s grade distri-          pends to their program, moving towards a new objective,
bution is 58% 6th grade, 36% 7th grade and 6% 8th grade.           until they reach a goal state. Hint B in Figure 2 shows a
                                                                   hint generated from such a transition.
From these sessions, we collected gameplay data for 20 tu-
torial puzzles as well as 13 user-created puzzles, With this
data, we created an Interaction Network in order to be able        4.2.2    Correction Transition
to provide hints and feedback for future students [3]. How-        (error/mistake) → (intermediate/goal) This transition oc-
ever, using program edits as states, the interaction networks      curs when a student makes and then corrects a mistake.
produced were very sparse. In order to be better able to           These are especially useful because we can offer hints based
relate similar actions, we produced another interaction net-       on the type of mistake. Hints D and E in Figure 2 show hints
work using program output as our state definition [6].             built from this type of transition; however, hint E shows a
                                                                   case where a student resolved the mistake in a suboptimal
                                                                   way.
4.2     States and Transitions
Based on the data collected, we can divide the set of ob-
served states into classes. First among these is the start state   4.2.3    Simple Solution Transition
in which the problem begins. By definition, every player’s         (start) → (goal) This occurs when a student enters an entire,
path must begin at this state. Next is the set of goal states      correct program, and solves the puzzle in one attempt. This
in which all buttons on the stage are pressed. These are           makes such transitions not particularly useful for generating
reported by the game as correct solutions. Any complete            hints, other than showing a potential solution state of the
solution, by definition, ends at one such state. Among states      puzzle. Hint F in Figure 2 shows this type of transition.
which are neither start nor goal states, there are three im-
portant classifications: Intermediate states (states a robot
moves through during a correct solution), mistake states
                                                                   4.2.4    Rethinking Transition
(states a robot does not move through during a correct so-         (intermediate) → (intermediate/goal) This transition occurs
lution), and error states (states which result from illegal        when rather than appending to the program as in a subgoal
output, like attempting to move the robot out-of-bounds).          transition, the user deletes part or all of their program, then
Based on these types of states, we classified our hints based      moving towards a new goal. As a result, the first state is
on the transitions they represented.                               unrelated to the next state the player reaches. Offering this
                                                                   state as a hint would likely not help guide a different user.
                                                                   Hint A in Figure 2 shows an example of this. Finding and
4.2.1    Subgoal Transition                                        recognizing these is an important direction for future work.
 4.2.5    Error Transition
(start/intermediate) → (mistake/error) This corresponds to
a program which walks the robot out of bounds, into an
object, or other similar errors. While we disregarded these
as hints, this type of transition may still be useful. In such
a case, the last legal output before the error could be a
valuable state. Hint C in Figure 2 is one such case.

4.3      Next Steps
While this approach was able to help us identify interest-
                                                                       Figure 3: This subgraph shows a short-circuit where
ing transitions, as well as significantly reduce the sparseness
                                                                       a player bypasses a chain of several steps.
of the Interaction Network by merging states with similar
output, we violate several assumptions of the Hint Factory
technique by using user compilation as an action. Essen-               Another problem which arises from this state representation
tially, the cost of an action can vary widely. In the most             is seen in Hints C and E above. These hints show states
extreme examples, the best next state selected by Hint Fac-            where a student traveled from a state to a worse state before
tory will simply be the goal state.                                    ultimately solving the problem. Since we limit our search for
                                                                       hintable states to the immediate child states of s in s0 , we are
4.4      Current Hint Policy                                           unable to escape from such a situation if the path containing
Our current hint selection policy is the same as the one used          the error is the best or only observable path to the goal.
in the logic tutor Deep Thought with a few exceptions [9].
We combine all student solution paths into a single graph,
mapping identical states to one another (comparing either
                                                                       4.5    Proposed Hint Policies
                                                                       One potential modification of the hint policy involves ana-
the programs or the output). Then, we calculate a fitness
                                                                       lyzing the programs/output on the nodes, using some dis-
value for each node. We assign a large positive value (100)
                                                                       tance metric δ(s, s0 ). This measurement would be used in
to each goal state, a low value for dead-end states (0) and
                                                                       addition to the state’s independent fitness value R(s) which
a step cost for each step taken (1). Setting a non-zero cost
                                                                       takes into account distance from a goal, but is irrespective
on actions biases us towards shorter solutions. We then
                                                                       of the distance from any previous state. For example in the
calculate fitness values V (s) for each state s, where R(s) is
                                                                       short-circuit example above, using “difference in number of
the initial fitness value for the state, γ is a discount factor,
                                                                       lines of code” as a distance metric we could take into ac-
and P (s, s0 ) is the observed frequency with which users in
                                                                       count how far the “Goal” state is from the “Start” state, and
state s go to state s0 next, via taking the action a. The
                                                                       potentially choose a nearer state as a hint. This also helps
equation for determining the fitness value of a state is as
                                                                       correct for small error-correction steps in player solutions;
follows:
                                                                       if the change between the current state and the target hint
                                                                       state is very small, we may want to consider hinting toward
                                                                       the next step instead, or a different solution path altogether.
                                   X
           V (s) := R(s) + γmax             Pa (s, s0 )V (s0 )   (1)
                               a
                                       s0
                                                                                                       X
                                                                              V (s) := R(s) + γmax          δ(s, s0 )P (s, s0 )V (s0 )   (3)
                                                                                                   a
However, in our current representation there is only one                                               s0
available action from any state: “run.” Different players us-
ing this action will change their programs in different ways
                                                                       One potential downside to this approach is that it requires
between runs, so it is not useful to aggregate across all the
                                                                       somewhat more knowledge of the domain to be built into the
possible resulting states. Instead, we want to consider each
                                                                       model. If the distance metric used is inaccurate or flawed,
resulting state on its own. As a result, we use a simplified
                                                                       there may be cases where we choose a very suboptimal hint.
version of the above, essentially considering each possible
                                                                       using difference in lines of code as our distance metric, the
resulting state s0 as the definite result of its own action:
                                                                       change between a state where a player is using no functions
                                                                       and a state where the user writes existing code into a func-
                                                                       tion may be very small. Hints selected in these cases might
             V (s) := R(s) + γmax
                               0
                                 P (s, s0 )V (s0 )               (2)   guide students away from desired outcomes in our game.
                                   s

                                                                       Another problem we need to resolve with our current hint
Since the action “run” can encompass many changes, select-             policy, as discussed above, is the case where the best or
ing the s0 which maximizes the value may not always be the             only path to a goal from a given state s has an error as
best choice for a hint. The difference between s and s0 can            a direct child s0 . One method of resolving this could be,
be quite large, and this is usually the case when an expert            instead of offering s0 as a hint, continuing to ask for next-
user solves the problem in one try, forming an edge directly           step hints from s0 until some s0 is a hintable, non-error state.
between the “start” state and “goal” state. These and other            This solution requires no additional knowledge of the game
“short-circuits” make it difficult to assess which of the child        domain, however it’s possible that the hint produced will
nodes would be best to offer as a hint by simply using the             be very far from s, or that we may skip over important
calculated fitness value.                                              information about how to resolve the error or misconception
that led the student into state s in the first place.             [4] W. Jin, T. Barnes, J. Stamper, M. J. Eagle, M. W.
                                                                      Johnson, and L. Lehmann. Program representation for
Other modifications to the hint selection policy may produce          automatic hint generation for a data-driven novice
better results than these. We hope to look into as many               programming tutor. In Intelligent Tutoring Systems,
possible modifications as we can, seeing which modifications          pages 304–309. Springer, 2012.
produce the most suitable hints on our current dataset be-        [5] D. J. Malan and H. H. Leitner. Scratch for budding
fore settling on an implementation for the live version of the        computer scientists. ACM SIGCSE Bulletin,
game.                                                                 39(1):223–227, 2007.
                                                                  [6] B. Peddycord III, A. Hicks, and T. Barnes.
5.   ACKNOWLEDGMENTS                                                  Generating hints for programming problems using
Thanks to the additional developers who have worked on this           intermediate output.
project or helped with our outreach activities so far, includ-    [7] A. Repenning, A. Basawapatna, and K. H. Koh.
ing Aaron Quidley, Trevor Brennan, Irena Rindos, Vincent              Making university education more like middle school
Bugica, Victoria Cooper, Dustin Culler, Shaun Pickford,               computer club: facilitating the flow of inspiration. In
Antoine Campbell, and Javier Olaya. This material is based            Proceedings of the 14th Western Canadian Conference
upon work supported by the National Science Foundation                on Computing Education, pages 9–16. ACM, 2009.
Graduate Research Fellowship under Grant No. 0900860              [8] K. Rivers and K. R. Koedinger. Automating hint
and Grant No. 1252376.                                                generation with solution space path construction. In
                                                                      Intelligent Tutoring Systems, pages 329–339. Springer,
6.   REFERENCES                                                       2014.
 [1] I. F. de Kereki. Scratch: Applications in computer           [9] J. Stamper, T. Barnes, L. Lehmann, and M. Croy.
     science 1. In Frontiers in Education Conference, 2008.           The hint factory: Automatic generation of
     FIE 2008. 38th Annual, pages T3B–7. IEEE, 2008.                  contextualized help for existing computer aided
 [2] R. Garfield. Roborally. [Board Game], 1994.                      instruction. In Proceedings of the 9th International
 [3] A. Hicks, B. Peddycord III, and T. Barnes. Building              Conference on Intelligent Tutoring Systems Young
     games to learn from their players: Generating hints in           Researchers Track, pages 71–78, 2008.
     a serious game. In Intelligent Tutoring Systems, pages      [10] D. Yaroslavski. LightBot. [Video Game], 2008.
     312–317. Springer, 2014.

</pre>