The Impact of Data-driven Positive Programming Feedback: When it Helps, What Happens when it Goes Wrong, and How Students Respond Preya shabrina Samiha Marwan NC State University NC State University pshabri@ncsu.edu samarwan@ncsu.edu Min Chi Thomas W. Price Tiffany Barnes NC State University NC State University NC State University mchi@ncsu.edu twprice@ncsu.edu tmbarnes@ncsu.edu ABSTRACT data-driven automated hints and feedback [13, 12] are be- This paper uses a case-based approach to investigate the ing explored as they can be generated automatically using impact of data-driven positive feedback on students’ be- historical or current log data with reduced engagement of haviour when integrated into a block-based programming experts. Researchers have investigated varied methods to environment. We embedded data-driven feature detectors generate automatic feedback (For example, hint and feed- to provide students with immediate positive feedback on back generation from historical data [5], next step hints from completed objectives during programming. We deployed the current code log [9] etc.) and also explored the quality and system in one programming homework in a non-majors CS impact of feedback on students’ perspective to learn [15, 12, class. We conducted an expert analysis to determine when 6]. data-driven detectors were correct or incorrect, and inves- While prior work has evaluated the quality of automated tigated the impact of the system on student behavior on feedback, or its impact on students’ performance or learn- the homework, specifically in terms of time they spent in ing, not much is known about the impact of this feedback the system. Our results highlight when data-driven positive on students’ programming behaviour when it fails to pro- feedback helps students, what happens when it goes wrong, vide reliable feedback. This needs more investigation to and how this impacted students’ programming behavior. Re- understand what measures can be taken or what support sults from these case studies can shed light on the design should be provided to students to mitigate adverse effects of future data-driven systems to provide novices with the of data-driven automated feedback. Our prior study showed positive feedback that can help them persist while learning that data-driven positive feedback increased students’ en- to program. gagement with the programming task, and improved their programming performance [5]. In this paper we present case Keywords studies of specific instances where our data-driven positive Snap, Block Based Programming, Data-Driven Hints, Posi- feedback system helped students to complete a programming tive Feedback, Adaptive Feedback assignment. Since, the feedback is given based on detection of objectives extracted from previous students’ correct solu- tions and does not have a way to adapt to new behaviour, it 1. INTRODUCTION is not always perfect. Thus, we also explore instances where Block-based programming environments are intended to pro- the system either failed to confirm students’ correct steps, vide novices with the ability to engage in motivating open- or provided misleading feedback and investigated students’ ended and creative programming with features that limit response to such events in terms of their programming be- syntax errors but allow for simplified programming for in- haviour and time spent on the task. teractive media [3, 2]. Some of these environments also in- clude automated support such as misconception-driven feed- back [4], next-step hints [6], or adaptive feedback [5], which have been shown to improve students’ learning. Recently, 2. RELATED WORKS Several block-based programming environments were designed to reduce the difficulties students face while learning a new programming language in various ways. For example, Al- ice [2], and Snap [3] provide drag-and-drop coding, and im- mediate visual code execution. Research has shown that these programming environments are more engaging in terms of reduced idle time while solving a programming prob- lem [8] and can produce positive learning outcomes in terms Copyright c 2020 for this paper by its authors. Use permitted under Cre- of grades [2] and the number of goals completed in a fixed ative Commons License Attribution 4.0 International (CC BY 4.0). amount of time [8]. To provide novice students with individualized tutoring sup- algorithm, described in [14]. This algorithm detects features, port, researchers have integrated intelligent features into i.e. sequence of code blocks that reflect properties of correct block-based programming environments. These intelligent solutions from previous students’ data. We used the 7 fea- features dynamically adapt teaching support to mitigate per- tures [Table 1 Column 2] extracted using this algorithm on sonalized needs [7]. For example, iSnap [10] is an exten- data from one programming exercise solved in the environ- sion of Snap that provides on-demand hints generated from ment as follows : the system converted snapshots of each edit students’ code logs using the Source Check Algorithm [11]. a student made into an abstract syntax tree (AST) to detect Gusukuma el al. [4] integrated automatic feedback based completed features. The system generates a sequence of 0s on learners’ mistakes and underlying misconceptions into and 1s called feature state (e.g. 1100000, where the first two BlockPy [1], and showed that it significantly improved stu- 1s indicate presence of the first two features, and 0s indicate dents’ performance. Such data-driven approaches are be- absence of rest of the features) for each student’s snapshots. ing integrated to provide more automated adaptive tutoring support in novice programming environments. For example, 3.3 Positive Feedback Interface iSnap showcased the first attempt to integrate data-driven We designed an interface, based on our prior work [5], to pro- support into a block-based programming environment. Zhi vide positive feedback using the DDPF system mentioned et al. [13] proposed a method of generating example-based above. This interface includes a progress panel that dis- feedback from historical data for iSnap. They extracted cor- plays a set of four objectives students need to complete to rect solution features from previous students’ code and used finish the programming task. Two experts in block-based those features to remove extraneous codes from current stu- programming converted the 7 data-driven features into four dent code and produced pairs of example solutions that were objectives with a meaningful description for each to be dis- provided on an on-demand basis. played in the progress panel. While a student is program- The impact and effectiveness of tutoring supports and in- ming, after each edit, the DDPF system detects the fea- telligent features integrated into novice programming envi- ture state of the current student snapshot, and updates the ronments have been explored by researchers from various corresponding objectives in the progress panel accordingly. perspectives. Zhi et al. [15] demonstrated the adoption of Initially, all the objectives in the progress panel are deacti- worked examples in a novice programming environment and vated. Once the system detects the presence of a feature, found out that worked examples helped students to com- the color of its corresponding objective changes to green, plete more tasks within a fixed period of time, but not sig- but if it detects the absence of a feature that was present nificantly more. Price et al. [12] explored the impact of the before (i.e. a broken feature), its corresponding objective quality of contextual hints generated from students’ current turns red. code on students’ help seeking behaviour. They found out that students who usually used hints at least once performed 4. PROCEDURE as good as students who usually do not perform poorly and We deployed our system in an introductory computing course also the quality of the first few hints is positively associated for non majors in Spring 2020 in a class of 27 students, which with future hint use and correlates to hint abuse. Marwan et took place in a public research university in the United al. [6] evaluated the impact of automated programming hints States. In this course, students used iSnap (Section 3.1) on students’ performance and learning, and argued that au- to solve their in-class programming assignments and home- tomated hints improved learning on subsequent isomorphic works. We integrated our DDPF system into the program- tasks when accompanied with self-explanation prompts. ming environment for one homework called Squiral, described In this paper, we adopted a case study based approach to ex- in Section 4.1. The data we collected consists of code snap- plore the positive impact of data-driven programming feed- shots for every edit in student code with corresponding times- back when generated accurately in a block-based novice pro- tamps. We also logged all the objective feature states of all gramming environment, and the negative impacts that can students’ code snapshots with the time when every objective occur when the system fails, to shed light on the influence of was completed or broken. Afterwards, we manually checked such feedback on novice students’ programming behaviour. the sequential code snapshots for each student and docu- mented the following: 3. SYSTEM DESIGN Early, Late, Incorrect and Just-In-Time Objective 3.1 The Novice Programming Environment Detection: We investigated the code snapshots for each We built the data-driven positive feedback system (DDPF) student, and filtered out these snapshots where the system in iSnap [10], a block-based intelligent novice programming detected the completion of an objective. Two researchers environment. This environment provides students with on- evaluated whether each detection was early (i.e. the ob- demand hints, and can also log all students’ edits while pro- jective was detected before the student completed the ob- gramming (e.g. adding or deleting a block) as a code trace. jective), or late (i.e. the student completed the objective This logging feature allows researchers or instructors to re- earlier than when it was detected by the system), incorrect play all students’ edits in the programming environment, (i.e. an objective was detected that was never completed), and detect the time for each edit as well. or just-in-time (i.e. the objective was detected at the step when the student just completed it). Agreement between Researchers and Automatic Ob- 3.2 Data-Driven Positive Feedback (DDPF) Sys- jective Detection: To measure the agreement between re- tem searchers and data-driven objective detection, we marked We built a system to provide positive feedback while stu- each step of a student’s code with: true positive TP, where dents program a specific exercise in the block-based pro- both the researchers and the system detect the completion gramming environment, using a data-driven feature detector of an objective at the same step, or true negative TN, where Table 1: Sample requirements to complete each ob- jective Objective Number and Required Features Label for Completion 1: Make a Squiral custom block and use it in your code - Create and use [similar to creating a function custom block and using it in the program] A nested loop- repeat y * z 2: The Squiral custom block Or rotates the correct number of repeat y Figure 1: a) A sample expert solution to solve the times repeat z Squiral assignment; b) Expected output y = rotation count z=4 Within loop : both the researchers and the system detect that an objec- 3: The length of each side of the move x steps tive was broken at the same step, or false positive FP, where Squiral is based on a variable x = length of a side the system detects the completion of an objective; however - pen down it was not detected by the researchers, or false negative FN, [outside loop] where the system detected an objective as broken at a step - Within loop : where the researchers detect no broken objective. 4: The length of the Squiral 1. move x steps Idle and Active Time : We measured the total active and increases with each side 2. turn 90 degrees idle time spent by each student on the system while solving 3. change x by the programming homework. We also measured, for each some value student, the active and idle time spent before each objective was detected by the system and the total active and idle time spent before the last change to any objective was de- 5. RQ1: HOW IT HELPS tected by the system. We chose a time gap of greater than When we observed each student’s solution at the end of their 3 minutes [75 th percentile based on the frequency distribu- attempt, we found out that 25 out of 27 students who were tion of different amount of time gaps] to be considered as provided feedback had working solutions of the Squiral as- idle time. Also, a time gap of greater than 10 minutes [95th signment at the end of their attempt, although, according percentile] was considered to be the start of a new session to researchers, their solutions were not always perfect from and thus, was not added towards either active time or idle a logical perspective [For example, using ‘size x size’ in the time. nested loop instead of ‘rotations x 4’, where ‘size’ = 10 and ‘rotations’ = 25, will produce the same Squiral. But, it is 4.1 The Squiral Assignment not logically correct, since the purpose of the nested loop The Squiral assignment is a programming homework that is to draw 4 sides of the Squiral in each rotation.]. Among asks students to create a procedure to draw a spiraling square- the 27 students, two students (Jade and Lime) attempted to like shape. One possible solution and its corresponding out- solve Squiral using the system both with and without feed- put is depicted in Figure 1. Using correct student solutions back. To get specific insight on how the system helped the collected from prior semesters, four objectives were iden- students to reach a correct solution, we examined code logs tified [described in section 3.3] and were provided to the of these two students who each attempted to solve Squiral students as sub-goals to achieve while solving the problem. without Data-Driven Feedback, and failed to complete the The specific features required to complete each objective are assignment. Later, those two students each attempted to shown in Table 1. solve Squiral with Data-Driven Feedback and succeeded. In this section, we present the case studies of these two stu- dents to demonstrate how the feedback system helped the 4.2 Research Questions students to fill the gaps in their code and led them to work- The goal of this study is to explore the impact of auto- ing solutions. mated data-driven adaptive feedback on students’ program- ming behavior. To achieve the goal, we aimed to answer the following research questions: 5.1 Case Study Lime RQ1: How it Helps. Student, Lime, without Data-Driven Positive Feed- How did data-driven feedback help students complete the back: Student Lime, when attempting to solve Squiral assignment? without any hints or feedback [Figure 2] given, used a custom RQ2: What happens when it Goes Wrong. block with a parameter. The student used ‘move’ and ‘turn’ How did the objective detectors impact student behavior, statements within a loop in the custom block. However, especially with regard to differences between researcher and there were three gaps in the code that the student could not algorithmic objective completion detection? figure out. First, a nested loop was required to iterate for RQ3: How Students Respond. ‘rotation count x 4’ times. Second, the move statement used How did data-driven feedback impact students’ active and ‘length x 2’ as its parameter whereas only ‘length’ would be idle time while solving a programming problem? sufficient. Finally, the variable used in the ‘move’ statement Figure 2: Student Lime’s Solution when no Feed- back was given Figure 4: Student Jade’s Solution without Feedback Figure 5: Student Jade’s Solution when Feedback Figure 3: Student Lime’s Solution when Feedback was Given was Given from organizational issues. Also, Jade couldn’t figure out must be incremented at each iteration. The student spent that the ‘move’ statement should use a variable instead of 18 minutes and 18 seconds before giving up, being unable to a constant and the same variable needs to be incremented figure out these issues. at each iteration. The number of repetitions in the repeat Student Lime with Data-Driven Positive Feedback: block was also incorrect. Later, student Lime attempted the homework again after Student Jade with Data-Driven Positive Feedback: receiving notice that the DDPF system was made available. Like Lime, Jade attempted the homework again when the When Lime received feedback, they figured out the 3 issues DDPF system was provided. When given feedback, Jade and reached a correct solution [Figure 3]. The second objec- first created a custom block and used it on the stage which tive suggests that there is a correct number of rotations that got the first objective marked green. The second objective is needed to be used within the custom block. With this hints on using a loop that repeats for correct number of feedback, Lime used ‘4 x Rotations’ in the ‘repeat’ block rotations within the custom block. This time Jade imple- instead of using ‘15’ and completed the second objective. mented the loop within the block and got the second ob- The third objective suggests the use of a variable in the jective correct. Within the loop, Jade used ‘move’, ‘turn’, ‘move’ statement. Lime used an initialized variable ‘length’ and ‘change’ statements and reached the correct solution in the ‘move’ statement instead of ‘length x 2’ and the objec- [Figure 5] with all objectives marked green. With adaptive tive was marked green. Finally, Lime incremented ‘length’ feedback, it took Jade 14 minutes 29 seconds to reach a cor- within the loop and all objectives were completed and they rect solution. Whereas without feedback, Jade gave up with reached a correct solution. With data-driven adaptive feed- an incorrect solution after spending over 16 minutes on the back, Lime spent 29 minutes 51 seconds before reaching the problem. correct solution. Recall that Lime gave up with an incor- rect solution after around 18 minutes when no feedback was given. 5.3 Findings The 2 case studies of Lime and Jade presented in this sec- tion demonstrated that the feedback system was able to help 5.2 Case Study Jade students in filling up the gaps in their code to reach a cor- Student Jade without Data-Driven Positive Feed- rect solution. In one case, this achievement came at the back: Student Jade initially attempted to solve Squiral cost of a higher active time and in the other case the stu- without data-driven positive feedback, spent 16 minutes and dent reached a correct solution in less time when feedback 55 seconds before giving up with an incorrect solution. Jade’s was provided. Also, we observed most of the students had a code [Figure 4] contains ‘repeat’, ‘move’, and ‘turn’ state- working solution [capable of drawing a Squiral] at the end of ments on the stage. Jade created a custom block and only their attempts, although not logically 100% correct accord- used the block to initialize a variable, ‘length’, that was also ing to researchers. Moreover, almost all (25 out of 27) the a parameter to the block. The components to complete the students explored syntactic constructs required to complete objectives were partially there in Jade’s code but it suffered the objectives, (e.g. move, turn, iteration, variables), which potentially indicates that they closely followed the objectives to accomplish the assignment. 6. RQ2: WHAT HAPPENS WHEN IT GOES WRONG To answer RQ2, we manually walked through the sequen- tial code snapshots of each student when they attempted to solve Squiral and generated case studies where the sys- tem went wrong and observed students’ problem solving ap- proach. Below we present three case studies demonstrating our system’s potential impact on student behavior when the system could not provide reliable feedback. We selected one Figure 6: Correct Solution Initially Implemented by case where the student completed an objective but the sys- Azure tem could not detect it (FN case) and two cases where the system detected completed objectives when the objectives were not completed (according to researchers), which led students to an incorrect solution or made them stop early. 6.1 Case Study Azure : FN Cases Causing Stu- dents to Work More than Necessary Student Azure started solving Squiral by creating a custom block and got the first objective correct. Azure used 2 pa- rameters, ‘size’ and ‘length’, to denote the number of ro- tations and length of the first side of the innermost loop. Figure 7: Incorrect Solution Initially Implemented They created a loop with a ‘repeat’ block with the correct by Student Cyan number of rotations (‘size x 4’) and got the second objective correct. As Azure used the ‘length’ parameter in the ‘move’ statement within the loop, they got the third objective cor- parameters in the custom block and used one of them in a rect. Then Azure added a ‘turn’ statement and incremented ‘move’ statement within a nested ‘repeat’ block that got him the ‘length’ variable. At this point, the fourth objective was the second and third objective correct. However, Cyan im- completed, according to researchers. However, the objective plemented another nested loop and added a ‘change’ state- was undetected by the system [FN case], because Azure used ment within that loop in the stage instead of adding them a ‘turn’ statement that was different from those used in the to the custom block. The system detected the objective and previous students’ solutions [that were used to extract and marked the fourth objective green. At this point, the stu- detect the objectives]. According to researchers, Azure’s so- dent Cyan had all objectives correct [FP cases] but the code lution was 100% correct at this point [Figure 6]. It took [Figure 7] was unable to draw a Squiral. this student only 2 minutes 24 seconds to reach the correct Later, Cyan removed the ‘change’ statement from the stage solution. that caused the fourth objective to be broken. However, re- However, the fourth objective was not detected by the sys- moving the custom block from the stage was causing other tem. Azure kept working on their code. Azure made several objectives to be broken. Cyan then moved the ‘change’ changes to their code which led them to an incorrect solu- statement to the custom block and corrected the rotation tion. Finally, Azure ended up submitting a solution that was count in the ‘repeat’ statement. At this point, the solution also 100% correct according to researchers, but was slightly was correct and similar to Figure 1a. The incorrectly de- different from their initial solution. In the submitted solu- tected objectives led Cyan to a non-working solution. In tion, Azure removed the ‘length’ variable from the parame- this case, Cyan had to ignore the detectors and do extra ter list of the custom block. The fourth objective was still work to reach a correct solution. undetected. While doing these changes, the student spent We observed a similar situation in the case of three other 12 minutes 5 seconds more in the system which is almost 5 students. One of them got four objectives correct and the times the amount of time the student spent to get a correct code was able to draw a Squiral, but according to researchers solution in the first place. the code had one programmatic problem. The code was We observed a similar situation in case of student Blue who drawing three sides of the Squiral using an inner loop and worked for a total of 1 hour 43 minutes 16 seconds. Blue one side manually. The problem detected by researchers reached a correct solution at 1 hour 7 minutes 11 seconds ac- remained undetected in the system and the student ended cording to researchers. But one objective (fourth objective) up submitting a partially correct solution. For the other was undetected. Blue kept working for another 36 minutes two students, four objectives got detected when there were 5 sec (almost 50% of the time taken to reach the correct syntactical problems present in their programs, as for stu- solution at first attempt). dent Cyan. The students realized that completing the ob- jectives did not necessarily mean that their code could draw 6.2 Case Study Cyan : FP Cases Leading Stu- a Squiral. Although FP cases led the students to incomplete dents to an Incorrect solution code with four checked objectives, incorrect output eventu- Student Cyan created a custom block and used it on the ally compelled each student to modify their codes and to stage and got the first objective correct. Cyan used two reach a 100% correct solution at the end. Figure 9: Phases in a Student’s Attempt to Solve Squiral Figure 8: Solution Submitted by Student Indigo Table 2: Active and Idle Time range before Objec- 6.3 Case Study Indigo : FP Cases Causing Stu- tives got detected for the first time Objective Active Time (min.) Idle Time(min.) dents to Stop Early at a Partially Correct 1 ∼0-12.45 0-9.27 Solution 2 0.1-84.5 0-10.1 We found 6 additional cases where the students got 4 objec- 3 0.4-17.9 0-13.25 tives correct, but the code was partially correct (FP Case) 4 0.5-18.2 0-19 according to researchers. However, their programs were able to draw a Squiral as required. In such cases, the students finished the attempt early and submitted the partially cor- feedback, since they did not seem to question the feedback rect solution. We present the case study of Student Indigo system as long as they had a working program that drew here. the correct shape. However, students did not rely on their Student Indigo’s solution [Figure 8] had objectives 1, 3, and own skill to determine when their code was correct. Further- 4 correctly completed according to researchers and the ob- more, students sometimes even ignored the produced output jectives were detected by the objective detection system as that showed they had a working solution, in the cases when well. Indigo created a custom block and used it in the stage objectives were not detected. [required to complete objective 1]. They used ‘pen down’ and added ‘move’, ‘turn’, and ‘change’ statements accord- ingly [required to complete objective 4] within a nested loop 7. RQ3: HOW STUDENTS RESPOND implemented with two ‘repeat’ statements. In the ‘move’ To answer RQ3, we investigated how the correctness of ob- statement, Indigo used an initialized variable, ‘Length’ [re- jective detection at early phases of a problem solving at- quired to complete objective 3], and incremented the value tempt impacted later phases of the attempt. We divided of ‘Length’ by 10 at each iteration. However, the rotation the total time each student spent on the system into three count used in the nested loop was wrong. One of the ‘repeat’ phases [Figure 9] - a) Phase A : when objectives were de- statements should have the count of rotations and the other tected for the first time; b) Phase B : changes in previously should have a constant 4, indicating the 4 sides of the square detected objectives were detected. c) Phase C : students drawn at each rotation. The objective detection system de- spent time in the system but no change in the objectives tected the use of the nested loop and marked objective 2 were detected. For example, a student got objectives 1, 3, green [FP case]. The code was able to draw a Squiral. How- and 4 marked green within 20 minutes of starting the at- ever, the implementation was not completely correct. But, tempt [Phase A]. Then the student continued working for once the student Indigo got 4 objectives correct, they sub- another 10 minutes [Phase B] but did not see a new ob- mitted their solution. jective [objective 2] go green. However, the previously de- tected objectives were broken and corrected several times. Finally, the student spent another 5 minutes [Phase C] when 6.4 Findings no change in any of the objectives were detected. We tried We observed cases where students reached a correct solu- to relate correct, incorrect, early, and late detection ratios tion, but could not recognize the correct solution because in phase A with the active and idle time spent in phases A, their completed objectives were not detected by the sys- B, and C to understand if correct, incorrect, early, or late tem. They continued working on the assignment for a longer detection regulates the time or effort students put on the time than was necessary. We also observed cases when stu- assignment. dents got 4 objectives correct but their solution was not even working, i.e. their code was not able to draw a Squiral Active and Idle Time Observed in Phase A: In this because of organizational or syntactical problems. Only in phase, objectives got detected for the first time by the sys- these cases, the students realized completing the objectives tem. Students were observed to take wide ranges of time is not enough. Thus, they modified the code overriding the before an objective was detected [Table 2]. We plotted aver- objective detectors to reach a working solution. Our third age active and idle time against correct objective detection case study showed that students who got 4 objectives correct ratios and observed that students with higher correct objec- with a working code submitted their solutions even if their tive detection ratio have shorter phase A in terms of active code was not fully correct according to researchers. In these time [Figure 10a]. For 1 of the students, this phase did not cases, all the students relied on objective detection and sub- occur because an objective was never detected in the stu- mitted partially-correct solutions. These case studies poten- dent’s code. In this phase, only a few cases were found when tially indicate students’ high reliance on objective detection the students had idle time. 16 out of 27 students did not Figure 10: a) Active Time in Phase A against Cor- Figure 12: Active Time in Phase C against Early rect Objective Detection Ratio; b) Idle Time in Objective Detection Ratio in Phase A Phase A against Early Objective Detection Ratio Active and Idle Time Observed in Phase C: This phase C indicates the time when no change was detected in any of the objectives. In this phase C, one of the following scenarios occur : 1) Most of the objectives were detected in earlier phases; the student had a working solution and was making minor modifications without impacting the ob- jectives; or 2) One, more than one, or all of the objectives were undetected and the student was working on the assign- ment but submitted the attempt without another objective being detected. We observed that when the first scenario occurs, students spent only a few minutes in phase C and submitted their code. 18 out of 27 students spent only 0.1-8 Figure 11: a) Active Time in Phase B against Cor- minutes in the system after scenario 1 occurred: even if the rect Objective Detection Ratio in Phase A; b) Idle objective detection was wrong, the student relied on the sys- Time in Phase B against Early Objective Detection tem, and submitted their code. We observed a higher early Ratio in Phase A detection ratio in phase A led to decreased average active time in phase C [Figure 7]. The remaining 9 students spent 12-56 minutes in this phase. Scenario 2 played out for 7 of have any idle time at all. 7 students had idle time ranging these 9 students, who all submitted the program with some from 3-5 minutes. The rest of the 4 students had idle time incomplete objectives. ranging from 10 - 24 minutes. We observed that a higher early detection rate (over 25%) has a decreasing trend in av- 7.1 Findings erage idle time [Figure 10b]. This may potentially indicate The results of our analysis showed that the active and idle that positive feedback can be motivating to students, even time spent in Phase B and Phase C are associated with the if it is provided early. quality of detection in Phase A in some cases. We observed Active and Idle Time Observed in Phase B: For 11 that correct objective detection in Phase A that led to a students, phase B did not occur at all, due to either the fact working solution pushed students to finish their attempt, that an objective was never detected, or the students sub- making Phases B and C shorter. Whereas, incorrect objec- mitted their code after all of the objectives were detected tive detection in Phase A that led to a non-working solution for the first time in phase A. 10 students spent >0 - <10 decreased the idle time observed in Phase B and students minutes, and 6 students spent >10 - 35 minutes in phase worked more. However, the active time in such cases var- B. 4 of the 6 students who spent a higher amount of time ied from student to student, and depended on the extent in this phase B had a high early detection ratio at phase to which objective detection went wrong. Our case stud- A (50-75%), and 1 student had a high incorrect detection ies presented in Section 6 demonstrated indication of stu- ratio (50%). These students did not have correct solutions, dents’ significant reliance on the feedback. This reliance on even if some or all of the objectives got detected in phase the system interacted differently depending on whether the A. In this phase B, 21 out of 27 students had no idle time. feedback was correct or incorrect, and whether or not the 6 students had idle time ranging from 3 to 24 minutes. As student code output appeared correct, and these differences we plotted average active and idle time in phase B against are reflected in students’ response or effort in terms of active the correct and incorrect objective detection ratio in phase and idle time. A, we observed a higher correct detection rate (>25%) in phase A seemed to decrease the active time spent in phase B [Figure 11a]. This means that correct detections helped 8. DISCUSSION students complete their programs more quickly. Correct ob- Our case studies and analysis demonstrated that, although jective detections in phase A pushed the students towards our feedback system could guide students to complete a pro- the end of their attempt. However, incorrect objective de- gramming assignment, it could mislead students to do ex- tection in phase A decreased idle time in phase B and caused tra work, submit a partial correct solution, or to end up at the students to continue actively working [Figure 11b]. a non-working solution in the event it provides inaccurate feedback. However, the visual feedback of objectives going [4] L. Gusukuma, A. C. Bart, D. Kafura, and J. Ernst. green [even for an incorrect or too-early detection] reduced Misconception-Driven Feedback: Results from an idle time, giving us an indication that the feedback was mo- Experimental Study. Proceedings of the 2018 ACM tivating for students. All these observed impacts could be Conference on International Computing Education the result of students’ high reliance on the feedback system. Research - ICER ’18, (1):160–168, 2018. To prevent the negative impacts generated in the event of [5] S. Marwan, G. Gao, S. Fisk, T. W. Price, and incorrect detections, mechanisms to prevent incorrect de- T. Barnes. Adaptive immediate feedback can improve tections must be explored. We observed our system to fail novice programming engagement and intention to in the event of new student behaviours. Since we built in- persist in computer science. In Proceedings of the terventions that use data from prior students and new stu- International Computing Education Research dents behave in new ways, the system has not had time to Conference (forthcoming), 2020. learn and adapt based on their behavior. Thus, any sys- [6] S. Marwan, J. Jay Williams, and T. Price. An tem like ours must have an iterative process for integrating evaluation of the impact of automated programming new behaviors that may arise from diversity in students and hints on performance and learning. In Proceedings of instructors. the 2019 ACM Conference on International Computing Education Research, pages 61–70, 2019. [7] T. Murray. Authoring intelligent tutoring systems: An 9. CONCLUSION AND FUTURE WORK analysis of the state of the art. 1999. This paper presents case studies to provide important in- sights into the impacts of positive feedback on novice pro- [8] T. W. Price and T. Barnes. Comparing textual and grammers from multiple perspectives. We present case stud- block interfaces in a novice programming environment. ies that shed light on how the feedback system helped two In Proceedings of the eleventh annual international students to complete a programming task who failed to com- conference on international computing education plete the task on their first attempts without feedback. While research, pages 91–99, 2015. these scenarios highlighted the usefulness of positive feed- [9] T. W. Price, Y. Dong, and T. Barnes. Generating back, our case studies surrounding the event when the sys- data-driven hints for open-ended programming. tem could not provide accurate feedback provided insights International Educational Data Mining Society, 2016. on what impact feedback failures can have on students’ re- [10] T. W. Price, Y. Dong, and D. Lipovac. isnap: towards sponses. These insights can be highly useful to decide on intelligent tutoring in novice programming measures to mitigate an adverse impact or to formulate environments. In Proceedings of the 2017 ACM adaptations to handle unexpected behaviors. This may in- SIGCSE Technical Symposium on Computer Science volve expressing a confidence level for detectors, or inviting Education, pages 483–488, 2017. students to self-explain how and why their solutions are cor- [11] T. W. Price, R. Zhi, and T. Barnes. Evaluation of a rect. The primary contributions of this work are: 1) Case Data-driven Feedback Algorithm for Open-ended Studies demonstrating how the positive feedback system can Programming. In Proceedings of the International induce a working solutions for a programming task; 2) Case Conference on Educational Data Mining, 2017. studies and code trace-based analyses that gave important [12] T. W. Price, R. Zhi, and T. Barnes. Hint generation insights on how a data-driven positive feedback system im- under uncertainty: The effect of hint quality on pacts students’ behaviour when the system goes wrong. Our help-seeking behavior. In International Conference on results show interesting relationships between correctness of Artificial Intelligence in Education, pages 311–322. the provided feedback and the time students spent on the Springer, 2017. task or in the system. In our future work, we plan to ex- [13] R. Zhi, S. Marwan, Y. Dong, N. Lytle, T. W. Price, plore these impacts in larger controlled studies and on other and T. Barnes. Toward data-driven example feedback programming tasks, and we also plan to explore how we can for novice programming. adapt our system to balance students’ understanding of their [14] R. Zhi, T. W. Price, N. Lytle, Y. Dong, and own code with reliance on feedback, to promote learning. T. Barnes. Reducing the state space of programming problems through data-driven feature detection. In EDM Workshop, 2018. 10. ACKNOWLEDGMENTS [15] R. Zhi, T. W. Price, S. Marwan, A. Milliken, This material is based upon work supported by the National T. Barnes, and M. Chi. Exploring the impact of Science Foundation under grants 1623470. worked examples in a novice programming environment. In Proceedings of the 50th ACM 11. REFERENCES Technical Symposium on Computer Science Education, [1] A. C. Bart, J. Tibau, E. Tilevich, C. A. Shaffer, and pages 98–104, 2019. D. Kafura. Blockpy: An open access data-science environment for introductory programmers. Computer, 50(5):18–26, 2017. [2] W. Dann, D. Cosgrove, D. Slater, D. Culyba, and S. Cooper. Mediated transfer: Alice 3 to java. In Proceedings of the 43rd ACM technical symposium on Computer Science Education, pages 141–146, 2012. [3] D. Garcia, B. Harvey, and T. Barnes. The beauty and joy of computing. ACM Inroads, 6(4):71–79, 2015.