=Paper=
{{Paper
|id=Vol-1446/GEDM_2015_Submission_10
|storemode=property
|title=BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game
|pdfUrl=https://ceur-ws.org/Vol-1446/GEDM_2015_Submission_10.pdf
|volume=Vol-1446
|dblpUrl=https://dblp.org/rec/conf/edm/HicksDZCB15
}}
==BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game==
BOTS: Selecting Next-Steps from Player Traces in a Puzzle Game Drew Hicks Yihuan Dong Rui Zhi North Carolina State North Carolina State North Carolina State University University University 911 Oval Drive 911 Oval Drive 911 Oval Drive Raleigh, NC 27606 Raleigh, NC 27606 Raleigh, NC 27606 aghicks3@ncsu.edu ydong2@ncsu.edu rzhi@ncsu.edu Veronica Cateté Tiffany Barnes North Carolina State North Carolina State University University 911 Oval Drive 911 Oval Drive Raleigh, NC 27606 Raleigh, NC 27606 vmcatete@ncsu.edu tmbarnes@ncsu.edu ABSTRACT model student behavior in tutors, and provide insight into In the field of Intelligent Tutoring Systems, data-driven meth- problem-solving strategies and misconceptions. This data ods for providing hints and feedback are becoming increas- structure can be used to provide hints, by treating the Inter- ingly popular. One such method, Hint Factory, builds an action Network similarly to a Markov Decision Process and interaction network out of observed player traces. This data selecting the most appropriate next step from the requesting structure is used to select the most appropriate next step user’s current state. This method has successfully been em- from any previously observed state, which can then be used ployed in systems in which each action a player may take is to provide guidance to future players. However, this method of similar cost; for example in the Deep Thought logic tutor has previously been employed in systems in which each ac- each action is an application of a particular axiom. Applying tion a player may take requires roughly similar effort; that this method to an environment where actions are of differ- is, the “step cost” is constant no matter what action is taken. ent costs, or outcomes are of varying value will require some We hope to apply similar methods to an interaction network adaptations to be made. In this work, we discuss how we built from player traces in our game, BOTS; However, each will apply Hint Factory methods to an interaction network edge can represent a varied amount of effort on the part of built from player traces in a puzzle game, BOTS. In BOTS, the student. Therefore, a different hint selection policy may each “Action” is the set of changes made to the program be needed. In this paper, we discuss the problems with our between each run. Therefore, using the current hint selec- current hint policy, assuming all edges are the same cost. tion policy would result in very high-level hints comprising Then, we discuss potential alternative hint selection policies a great number of changes to the student’s program. Since we have considered. this is undesirable, a different hint selection policy may be needed. Keywords Hint Generation, Serious Games, Data Mining 2. DATA-DRIVEN HINTS AND FEEDBACK In the ITS community, several methods have been proposed 1. INTRODUCTION for generating hints/feedback from previous observations of Data-driven methods for providing hints and feedback are users’ solutions or behavior. Rivers et al propose a data- becoming increasingly popular, and are especially useful for driven method to generate hints automatically for novice environments with user- or procedurally-generated content. programmers based on Hint Factory[8]. They present a One such method, Hint Factory, builds an interaction net- domain-independent algorithm, which automates hint gener- work out of observed player traces. An Interaction Network ation. Their method relies on solution space, which utilizes is a complex network of student-tutor interactions, used to graph to represent the solution states. In solution space, each node represents a candidate solution and each edge rep- resents the action used to transfer from one state to another. Due to the existence of multiple ways to solve a program- ming problem, the size of the solution space is huge and thus it is impractical to use. A Canonicalizing model is used to reduce the size of the solution space. All states are trans- formed to canonicalized abstract syntax trees (ASTs). If the canonical form of two different states are identical, they can be combined together. After simplifying the solution space, hint generation is implemented. If the current state is in- correct and not in the solution space, the path construction algorithm will find an optimal goal state in the solution space which is closest to current state. This algorithm uses change vectors to denote the change between current state and goal state. Once a better goal state is found during enumerat- ing all possible changes, it returns the current combination of change vectors. Each change vector can be applied to current state and then form an intermediate state. The in- termediate states are measured by desirability score, which represents the value of the state. And then the path con- struction algorithm generates optimal next states based on the rank of the desirability scores of all the intermediate states. Thus a new path can be formed and added to the solution space, and appropriate hints can be generated. Jin et al propose linkage graph to generate hints for pro- gramming courses[4]. Linkage graph uses nodes to represent program statements and direct edges to indicate ordered de- Figure 1: The BOTS interface. The robot’s program pendencies of those statements. Jin’s approach applies ma- is along the left side of the screen. The “toolbox” of trix to store linkage graph for computation. To generate available commands is along the top of the screen. linkage matrix, first, they normalize variables in programs by using instructor-provided variable specification file. After variable normalization, they sort the statement with 3 steps: (i) preprocessing, which breaks a single declaration for mul- information and check if hints are available for the state. tiple variables (e.g. int a, b, c) into multiple declaration The action that leads to subsequent state with the highest statements (e.g. int a; int b; int c;); (ii) creating statement value is used to generate a hint sequence. A hint sequence sets according to variable dependencies, which put indepen- consists of four types of hints and are ordered from gen- dent statements into first set, put statements depend only eral hint to detailed hint. Hint provider will then show hint on statements in the first set into second set, put statements from top of the sequence to the student. Hint Factory has depends only on statements in the first and second set into been applied in logic tutors which helps students learn logic third set, and so on; (iii) in-set statement sorting, during proof. The result shows that the hint-generating function which the statements are sorted in decreasing order within could provide hints over 80% of the time. set using their variable signatures. In hint generation, they first generate linkage graphs with a set of correct solutions, 3. BOTS as the sources for hint generation. They also compose the BOTS is a programming puzzle game designed to teach intermediate steps during program development into a large fundamental ideas of programming and problem-solving to linkage graph, and assign a reward value to each state and novice computer users. The goal of the BOTS project is the correct solution. Then, they apply value iteration to to investigate how to best use community-authored content create a Markov Decision Process (MDP). When a student within serious games and educational games. BOTS was requires hint, tutor will generate a linkage graph for the par- inspired by games like LightBot [10] and RoboRally [2], as tial program and try to find the closest match in MDP. If well as the success of Scratch and it’s online community [1] a match is found in MDP, the tutor would generate hint [5]. In BOTS, players take on the role of programmers writ- with the next best state based on highest assigned value. ing code to navigate a simple robot around a grid-based 3D If a match is not found in current MDP, which means the environment, as seen in Figure 1. The goal of each puzzle student is taking a different approach from existing correct is to press several switches within the environment, which solutions, the tutor will try to modify those correct solutions can be done by placing an object or the robot on them. to fit student’s program and then provide hints. To program the robots, players will use simple graphical pseudo-code, allowing them to move the robot, repeat sets Hint Factory[9] is an automatic hint generation technique of commands using “for” or “while” loops, and re-use chunks which uses Markov decision processes (MDPs) to generate of code using functions. Within each puzzle, players’ scores contextualized hints from past student data. It mainly con- depend on the number of commands used, with lower scores sists of two parts - Markov Decision Process (MDP) gener- being preferable. In addition, each puzzle limits the maxi- ator and hint provider. The MDP generator runs a process mum number of commands, as well as the number of times to generate MDP values for all states seen in previous stu- each command can be used. For example, in the tutorial dents’ solutions. In this process, all the students’ solutions levels, a user may only use the “Move Forward” instruction are combined together to form a single graph. Each node 10 times. Therefore, if a player wants to make the robot of the graph represents a state, and each edge represents an walk down a long hallway, it will be more efficient to use a action one student takes to transform from current state to loop to repeat a single “Move Forward” instruction, rather another state. Once the graph is built, the MDP generator than to simply use several “Move Forward” instructions one uses Bellman backup to assign values for all nodes. After up- after the other. These constraints are meant to encourage dating all values, a hint file is generated. The hint provider players to re-use code and optimize their solutions. uses hint file to provide hint. When a student asks for a hint at a existing state, hint provider will retrieve current state In addition to the guided tutorial mode, BOTS also con- tains an extensive “Free Play” mode, with a wide selection of puzzles created by other players. The game, in line with a b the “Flow of Inspiration” principles outlined by Alexander Repenning [7], provides multiple ways for players to share knowledge through authoring and modifying content. Play- ers are able to create their own puzzles to share with their peers, and can play and evaluate friends’ puzzles, improv- ing on past solutions. Features such as peer-authored hints c d for difficult puzzles, and a collaborative filtering approach to rating are planned next steps for the game’s online ele- ment. We hope to create an environment where players can Error continually challenge their peers to find consistently better solutions for increasingly difficult problems. e f User-generated content supports replayability and a sense of a community for a serious game. We believe that user- created puzzles could improve interest, encouraging students to return to the game to solidify their mastery of old skills and potentially helping them pick up new ones. Figure 2: Several generated hints for a simple puz- zle. The blue icon represents the robot. The ’X’ icon 4. ANALYSIS represents a goal. Shaded boxes are boxes placed on goals, while unshaded boxes are not on goals. Hint 4.1 Dataset F in this figure depicts the start and most common Data for the BOTS studies has come from a middle school goal state of the puzzle. computer science enrichment program called SPARCS. In this program, the students attend class on Saturday for 4 hours where computer science undergraduates teach them (start/intermediate) → (intermediate/goal) These transitions about computational thinking and programming. Students occur when a student moves the robot to an intermediate attend a total of 7 sessions, each on a different topic, ranging state rather than directly to the goal. Since players run from security and encryption to game design. The students their programs to produce output, we speculate that these all attend the same magnet middle school. The demograph- may represent subgoals such asmoving a box onto a specific ics for this club are 74.2% male, 25.8% female, 36.7% African switch. After accomplishing that task, the user then ap- American, and 23.3% Hispanic. The student’s grade distri- pends to their program, moving towards a new objective, bution is 58% 6th grade, 36% 7th grade and 6% 8th grade. until they reach a goal state. Hint B in Figure 2 shows a hint generated from such a transition. From these sessions, we collected gameplay data for 20 tu- torial puzzles as well as 13 user-created puzzles, With this data, we created an Interaction Network in order to be able 4.2.2 Correction Transition to provide hints and feedback for future students [3]. How- (error/mistake) → (intermediate/goal) This transition oc- ever, using program edits as states, the interaction networks curs when a student makes and then corrects a mistake. produced were very sparse. In order to be better able to These are especially useful because we can offer hints based relate similar actions, we produced another interaction net- on the type of mistake. Hints D and E in Figure 2 show hints work using program output as our state definition [6]. built from this type of transition; however, hint E shows a case where a student resolved the mistake in a suboptimal way. 4.2 States and Transitions Based on the data collected, we can divide the set of ob- served states into classes. First among these is the start state 4.2.3 Simple Solution Transition in which the problem begins. By definition, every player’s (start) → (goal) This occurs when a student enters an entire, path must begin at this state. Next is the set of goal states correct program, and solves the puzzle in one attempt. This in which all buttons on the stage are pressed. These are makes such transitions not particularly useful for generating reported by the game as correct solutions. Any complete hints, other than showing a potential solution state of the solution, by definition, ends at one such state. Among states puzzle. Hint F in Figure 2 shows this type of transition. which are neither start nor goal states, there are three im- portant classifications: Intermediate states (states a robot moves through during a correct solution), mistake states 4.2.4 Rethinking Transition (states a robot does not move through during a correct so- (intermediate) → (intermediate/goal) This transition occurs lution), and error states (states which result from illegal when rather than appending to the program as in a subgoal output, like attempting to move the robot out-of-bounds). transition, the user deletes part or all of their program, then Based on these types of states, we classified our hints based moving towards a new goal. As a result, the first state is on the transitions they represented. unrelated to the next state the player reaches. Offering this state as a hint would likely not help guide a different user. Hint A in Figure 2 shows an example of this. Finding and 4.2.1 Subgoal Transition recognizing these is an important direction for future work. 4.2.5 Error Transition (start/intermediate) → (mistake/error) This corresponds to a program which walks the robot out of bounds, into an object, or other similar errors. While we disregarded these as hints, this type of transition may still be useful. In such a case, the last legal output before the error could be a valuable state. Hint C in Figure 2 is one such case. 4.3 Next Steps While this approach was able to help us identify interest- Figure 3: This subgraph shows a short-circuit where ing transitions, as well as significantly reduce the sparseness a player bypasses a chain of several steps. of the Interaction Network by merging states with similar output, we violate several assumptions of the Hint Factory technique by using user compilation as an action. Essen- Another problem which arises from this state representation tially, the cost of an action can vary widely. In the most is seen in Hints C and E above. These hints show states extreme examples, the best next state selected by Hint Fac- where a student traveled from a state to a worse state before tory will simply be the goal state. ultimately solving the problem. Since we limit our search for hintable states to the immediate child states of s in s0 , we are 4.4 Current Hint Policy unable to escape from such a situation if the path containing Our current hint selection policy is the same as the one used the error is the best or only observable path to the goal. in the logic tutor Deep Thought with a few exceptions [9]. We combine all student solution paths into a single graph, mapping identical states to one another (comparing either 4.5 Proposed Hint Policies One potential modification of the hint policy involves ana- the programs or the output). Then, we calculate a fitness lyzing the programs/output on the nodes, using some dis- value for each node. We assign a large positive value (100) tance metric δ(s, s0 ). This measurement would be used in to each goal state, a low value for dead-end states (0) and addition to the state’s independent fitness value R(s) which a step cost for each step taken (1). Setting a non-zero cost takes into account distance from a goal, but is irrespective on actions biases us towards shorter solutions. We then of the distance from any previous state. For example in the calculate fitness values V (s) for each state s, where R(s) is short-circuit example above, using “difference in number of the initial fitness value for the state, γ is a discount factor, lines of code” as a distance metric we could take into ac- and P (s, s0 ) is the observed frequency with which users in count how far the “Goal” state is from the “Start” state, and state s go to state s0 next, via taking the action a. The potentially choose a nearer state as a hint. This also helps equation for determining the fitness value of a state is as correct for small error-correction steps in player solutions; follows: if the change between the current state and the target hint state is very small, we may want to consider hinting toward the next step instead, or a different solution path altogether. X V (s) := R(s) + γmax Pa (s, s0 )V (s0 ) (1) a s0 X V (s) := R(s) + γmax δ(s, s0 )P (s, s0 )V (s0 ) (3) a However, in our current representation there is only one s0 available action from any state: “run.” Different players us- ing this action will change their programs in different ways One potential downside to this approach is that it requires between runs, so it is not useful to aggregate across all the somewhat more knowledge of the domain to be built into the possible resulting states. Instead, we want to consider each model. If the distance metric used is inaccurate or flawed, resulting state on its own. As a result, we use a simplified there may be cases where we choose a very suboptimal hint. version of the above, essentially considering each possible using difference in lines of code as our distance metric, the resulting state s0 as the definite result of its own action: change between a state where a player is using no functions and a state where the user writes existing code into a func- tion may be very small. Hints selected in these cases might V (s) := R(s) + γmax 0 P (s, s0 )V (s0 ) (2) guide students away from desired outcomes in our game. s Another problem we need to resolve with our current hint Since the action “run” can encompass many changes, select- policy, as discussed above, is the case where the best or ing the s0 which maximizes the value may not always be the only path to a goal from a given state s has an error as best choice for a hint. The difference between s and s0 can a direct child s0 . One method of resolving this could be, be quite large, and this is usually the case when an expert instead of offering s0 as a hint, continuing to ask for next- user solves the problem in one try, forming an edge directly step hints from s0 until some s0 is a hintable, non-error state. between the “start” state and “goal” state. These and other This solution requires no additional knowledge of the game “short-circuits” make it difficult to assess which of the child domain, however it’s possible that the hint produced will nodes would be best to offer as a hint by simply using the be very far from s, or that we may skip over important calculated fitness value. information about how to resolve the error or misconception that led the student into state s in the first place. [4] W. Jin, T. Barnes, J. Stamper, M. J. Eagle, M. W. Johnson, and L. Lehmann. Program representation for Other modifications to the hint selection policy may produce automatic hint generation for a data-driven novice better results than these. We hope to look into as many programming tutor. In Intelligent Tutoring Systems, possible modifications as we can, seeing which modifications pages 304–309. Springer, 2012. produce the most suitable hints on our current dataset be- [5] D. J. Malan and H. H. Leitner. Scratch for budding fore settling on an implementation for the live version of the computer scientists. ACM SIGCSE Bulletin, game. 39(1):223–227, 2007. [6] B. Peddycord III, A. Hicks, and T. Barnes. 5. ACKNOWLEDGMENTS Generating hints for programming problems using Thanks to the additional developers who have worked on this intermediate output. project or helped with our outreach activities so far, includ- [7] A. Repenning, A. Basawapatna, and K. H. Koh. ing Aaron Quidley, Trevor Brennan, Irena Rindos, Vincent Making university education more like middle school Bugica, Victoria Cooper, Dustin Culler, Shaun Pickford, computer club: facilitating the flow of inspiration. In Antoine Campbell, and Javier Olaya. This material is based Proceedings of the 14th Western Canadian Conference upon work supported by the National Science Foundation on Computing Education, pages 9–16. ACM, 2009. Graduate Research Fellowship under Grant No. 0900860 [8] K. Rivers and K. R. Koedinger. Automating hint and Grant No. 1252376. generation with solution space path construction. In Intelligent Tutoring Systems, pages 329–339. Springer, 6. REFERENCES 2014. [1] I. F. de Kereki. Scratch: Applications in computer [9] J. Stamper, T. Barnes, L. Lehmann, and M. Croy. science 1. In Frontiers in Education Conference, 2008. The hint factory: Automatic generation of FIE 2008. 38th Annual, pages T3B–7. IEEE, 2008. contextualized help for existing computer aided [2] R. Garfield. Roborally. [Board Game], 1994. instruction. In Proceedings of the 9th International [3] A. Hicks, B. Peddycord III, and T. Barnes. Building Conference on Intelligent Tutoring Systems Young games to learn from their players: Generating hints in Researchers Track, pages 71–78, 2008. a serious game. In Intelligent Tutoring Systems, pages [10] D. Yaroslavski. LightBot. [Video Game], 2008. 312–317. Springer, 2014.