The Impact of Data-driven Positive Programming
     Feedback: When it Helps, What Happens when it Goes
             Wrong, and How Students Respond

                                         Preya shabrina                         Samiha Marwan
                                        NC State University                     NC State University
                                      pshabri@ncsu.edu                     samarwan@ncsu.edu

                         Min Chi                             Thomas W. Price                       Tiffany Barnes
                  NC State University                         NC State University                 NC State University
                  mchi@ncsu.edu                             twprice@ncsu.edu                   tmbarnes@ncsu.edu


ABSTRACT                                                                   data-driven automated hints and feedback [13, 12] are be-
This paper uses a case-based approach to investigate the                   ing explored as they can be generated automatically using
impact of data-driven positive feedback on students’ be-                   historical or current log data with reduced engagement of
haviour when integrated into a block-based programming                     experts. Researchers have investigated varied methods to
environment. We embedded data-driven feature detectors                     generate automatic feedback (For example, hint and feed-
to provide students with immediate positive feedback on                    back generation from historical data [5], next step hints from
completed objectives during programming. We deployed the                   current code log [9] etc.) and also explored the quality and
system in one programming homework in a non-majors CS                      impact of feedback on students’ perspective to learn [15, 12,
class. We conducted an expert analysis to determine when                   6].
data-driven detectors were correct or incorrect, and inves-                While prior work has evaluated the quality of automated
tigated the impact of the system on student behavior on                    feedback, or its impact on students’ performance or learn-
the homework, specifically in terms of time they spent in                  ing, not much is known about the impact of this feedback
the system. Our results highlight when data-driven positive                on students’ programming behaviour when it fails to pro-
feedback helps students, what happens when it goes wrong,                  vide reliable feedback. This needs more investigation to
and how this impacted students’ programming behavior. Re-                  understand what measures can be taken or what support
sults from these case studies can shed light on the design                 should be provided to students to mitigate adverse effects
of future data-driven systems to provide novices with the                  of data-driven automated feedback. Our prior study showed
positive feedback that can help them persist while learning                that data-driven positive feedback increased students’ en-
to program.                                                                gagement with the programming task, and improved their
                                                                           programming performance [5]. In this paper we present case
Keywords                                                                   studies of specific instances where our data-driven positive
Snap, Block Based Programming, Data-Driven Hints, Posi-                    feedback system helped students to complete a programming
tive Feedback, Adaptive Feedback                                           assignment. Since, the feedback is given based on detection
                                                                           of objectives extracted from previous students’ correct solu-
                                                                           tions and does not have a way to adapt to new behaviour, it
1.    INTRODUCTION                                                         is not always perfect. Thus, we also explore instances where
Block-based programming environments are intended to pro-                  the system either failed to confirm students’ correct steps,
vide novices with the ability to engage in motivating open-                or provided misleading feedback and investigated students’
ended and creative programming with features that limit                    response to such events in terms of their programming be-
syntax errors but allow for simplified programming for in-                 haviour and time spent on the task.
teractive media [3, 2]. Some of these environments also in-
clude automated support such as misconception-driven feed-
back [4], next-step hints [6], or adaptive feedback [5], which
have been shown to improve students’ learning. Recently,
                                                                           2.   RELATED WORKS
                                                                           Several block-based programming environments were designed
                                                                           to reduce the difficulties students face while learning a new
                                                                           programming language in various ways. For example, Al-
                                                                           ice [2], and Snap [3] provide drag-and-drop coding, and im-
                                                                           mediate visual code execution. Research has shown that
                                                                           these programming environments are more engaging in terms
                                                                           of reduced idle time while solving a programming prob-
                                                                           lem [8] and can produce positive learning outcomes in terms
Copyright c 2020 for this paper by its authors. Use permitted under Cre-   of grades [2] and the number of goals completed in a fixed
ative Commons License Attribution 4.0 International (CC BY 4.0).           amount of time [8].
To provide novice students with individualized tutoring sup-       algorithm, described in [14]. This algorithm detects features,
port, researchers have integrated intelligent features into        i.e. sequence of code blocks that reflect properties of correct
block-based programming environments. These intelligent            solutions from previous students’ data. We used the 7 fea-
features dynamically adapt teaching support to mitigate per-       tures [Table 1 Column 2] extracted using this algorithm on
sonalized needs [7]. For example, iSnap [10] is an exten-          data from one programming exercise solved in the environ-
sion of Snap that provides on-demand hints generated from          ment as follows : the system converted snapshots of each edit
students’ code logs using the Source Check Algorithm [11].         a student made into an abstract syntax tree (AST) to detect
Gusukuma el al. [4] integrated automatic feedback based            completed features. The system generates a sequence of 0s
on learners’ mistakes and underlying misconceptions into           and 1s called feature state (e.g. 1100000, where the first two
BlockPy [1], and showed that it significantly improved stu-        1s indicate presence of the first two features, and 0s indicate
dents’ performance. Such data-driven approaches are be-            absence of rest of the features) for each student’s snapshots.
ing integrated to provide more automated adaptive tutoring
support in novice programming environments. For example,           3.3    Positive Feedback Interface
iSnap showcased the first attempt to integrate data-driven         We designed an interface, based on our prior work [5], to pro-
support into a block-based programming environment. Zhi            vide positive feedback using the DDPF system mentioned
et al. [13] proposed a method of generating example-based          above. This interface includes a progress panel that dis-
feedback from historical data for iSnap. They extracted cor-       plays a set of four objectives students need to complete to
rect solution features from previous students’ code and used       finish the programming task. Two experts in block-based
those features to remove extraneous codes from current stu-        programming converted the 7 data-driven features into four
dent code and produced pairs of example solutions that were        objectives with a meaningful description for each to be dis-
provided on an on-demand basis.                                    played in the progress panel. While a student is program-
The impact and effectiveness of tutoring supports and in-          ming, after each edit, the DDPF system detects the fea-
telligent features integrated into novice programming envi-        ture state of the current student snapshot, and updates the
ronments have been explored by researchers from various            corresponding objectives in the progress panel accordingly.
perspectives. Zhi et al. [15] demonstrated the adoption of         Initially, all the objectives in the progress panel are deacti-
worked examples in a novice programming environment and            vated. Once the system detects the presence of a feature,
found out that worked examples helped students to com-             the color of its corresponding objective changes to green,
plete more tasks within a fixed period of time, but not sig-       but if it detects the absence of a feature that was present
nificantly more. Price et al. [12] explored the impact of the      before (i.e. a broken feature), its corresponding objective
quality of contextual hints generated from students’ current       turns red.
code on students’ help seeking behaviour. They found out
that students who usually used hints at least once performed       4.    PROCEDURE
as good as students who usually do not perform poorly and
                                                                   We deployed our system in an introductory computing course
also the quality of the first few hints is positively associated
                                                                   for non majors in Spring 2020 in a class of 27 students, which
with future hint use and correlates to hint abuse. Marwan et
                                                                   took place in a public research university in the United
al. [6] evaluated the impact of automated programming hints
                                                                   States. In this course, students used iSnap (Section 3.1)
on students’ performance and learning, and argued that au-
                                                                   to solve their in-class programming assignments and home-
tomated hints improved learning on subsequent isomorphic
                                                                   works. We integrated our DDPF system into the program-
tasks when accompanied with self-explanation prompts.
                                                                   ming environment for one homework called Squiral, described
In this paper, we adopted a case study based approach to ex-
                                                                   in Section 4.1. The data we collected consists of code snap-
plore the positive impact of data-driven programming feed-
                                                                   shots for every edit in student code with corresponding times-
back when generated accurately in a block-based novice pro-
                                                                   tamps. We also logged all the objective feature states of all
gramming environment, and the negative impacts that can
                                                                   students’ code snapshots with the time when every objective
occur when the system fails, to shed light on the influence of
                                                                   was completed or broken. Afterwards, we manually checked
such feedback on novice students’ programming behaviour.
                                                                   the sequential code snapshots for each student and docu-
                                                                   mented the following:
3. SYSTEM DESIGN                                                   Early, Late, Incorrect and Just-In-Time Objective
3.1 The Novice Programming Environment                             Detection: We investigated the code snapshots for each
We built the data-driven positive feedback system (DDPF)           student, and filtered out these snapshots where the system
in iSnap [10], a block-based intelligent novice programming        detected the completion of an objective. Two researchers
environment. This environment provides students with on-           evaluated whether each detection was early (i.e. the ob-
demand hints, and can also log all students’ edits while pro-      jective was detected before the student completed the ob-
gramming (e.g. adding or deleting a block) as a code trace.        jective), or late (i.e. the student completed the objective
This logging feature allows researchers or instructors to re-      earlier than when it was detected by the system), incorrect
play all students’ edits in the programming environment,           (i.e. an objective was detected that was never completed),
and detect the time for each edit as well.                         or just-in-time (i.e. the objective was detected at the step
                                                                   when the student just completed it).
                                                                   Agreement between Researchers and Automatic Ob-
3.2    Data-Driven Positive Feedback (DDPF) Sys-                   jective Detection: To measure the agreement between re-
       tem                                                         searchers and data-driven objective detection, we marked
We built a system to provide positive feedback while stu-          each step of a student’s code with: true positive TP, where
dents program a specific exercise in the block-based pro-          both the researchers and the system detect the completion
gramming environment, using a data-driven feature detector         of an objective at the same step, or true negative TN, where
                                                                  Table 1: Sample requirements to complete each ob-
                                                                  jective
                                                                    Objective Number and              Required Features
                                                                    Label                             for Completion
                                                                    1: Make a Squiral custom
                                                                    block and use it in your code     - Create and use
                                                                    [similar to creating a function   custom block
                                                                    and using it in the program]
                                                                                                      A nested loop-
                                                                                                      repeat y * z
                                                                    2: The Squiral custom block       Or
                                                                    rotates the correct number of     repeat y
Figure 1: a) A sample expert solution to solve the                  times                             repeat z
Squiral assignment; b) Expected output                                                                y = rotation count
                                                                                                      z=4
                                                                                                      Within loop :
both the researchers and the system detect that an objec-           3: The length of each side of the
                                                                                                      move x steps
tive was broken at the same step, or false positive FP, where       Squiral is based on a variable
                                                                                                      x = length of a side
the system detects the completion of an objective; however                                            - pen down
it was not detected by the researchers, or false negative FN,                                         [outside loop]
where the system detected an objective as broken at a step                                            - Within loop :
where the researchers detect no broken objective.                   4: The length of the Squiral
                                                                                                      1. move x steps
Idle and Active Time : We measured the total active and             increases with each side
                                                                                                      2. turn 90 degrees
idle time spent by each student on the system while solving                                           3. change x by
the programming homework. We also measured, for each                                                  some value
student, the active and idle time spent before each objective
was detected by the system and the total active and idle
time spent before the last change to any objective was de-        5.    RQ1: HOW IT HELPS
tected by the system. We chose a time gap of greater than         When we observed each student’s solution at the end of their
3 minutes [75 th percentile based on the frequency distribu-      attempt, we found out that 25 out of 27 students who were
tion of different amount of time gaps] to be considered as        provided feedback had working solutions of the Squiral as-
idle time. Also, a time gap of greater than 10 minutes [95th      signment at the end of their attempt, although, according
percentile] was considered to be the start of a new session       to researchers, their solutions were not always perfect from
and thus, was not added towards either active time or idle        a logical perspective [For example, using ‘size x size’ in the
time.                                                             nested loop instead of ‘rotations x 4’, where ‘size’ = 10 and
                                                                  ‘rotations’ = 25, will produce the same Squiral. But, it is
4.1   The Squiral Assignment                                      not logically correct, since the purpose of the nested loop
The Squiral assignment is a programming homework that             is to draw 4 sides of the Squiral in each rotation.]. Among
asks students to create a procedure to draw a spiraling square-   the 27 students, two students (Jade and Lime) attempted to
like shape. One possible solution and its corresponding out-      solve Squiral using the system both with and without feed-
put is depicted in Figure 1. Using correct student solutions      back. To get specific insight on how the system helped the
collected from prior semesters, four objectives were iden-        students to reach a correct solution, we examined code logs
tified [described in section 3.3] and were provided to the        of these two students who each attempted to solve Squiral
students as sub-goals to achieve while solving the problem.       without Data-Driven Feedback, and failed to complete the
The specific features required to complete each objective are     assignment. Later, those two students each attempted to
shown in Table 1.                                                 solve Squiral with Data-Driven Feedback and succeeded. In
                                                                  this section, we present the case studies of these two stu-
                                                                  dents to demonstrate how the feedback system helped the
4.2   Research Questions                                          students to fill the gaps in their code and led them to work-
The goal of this study is to explore the impact of auto-
                                                                  ing solutions.
mated data-driven adaptive feedback on students’ program-
ming behavior. To achieve the goal, we aimed to answer the
following research questions:                                     5.1   Case Study Lime
RQ1: How it Helps.                                                Student, Lime, without Data-Driven Positive Feed-
How did data-driven feedback help students complete the           back: Student Lime, when attempting to solve Squiral
assignment?                                                       without any hints or feedback [Figure 2] given, used a custom
RQ2: What happens when it Goes Wrong.                             block with a parameter. The student used ‘move’ and ‘turn’
How did the objective detectors impact student behavior,          statements within a loop in the custom block. However,
especially with regard to differences between researcher and      there were three gaps in the code that the student could not
algorithmic objective completion detection?                       figure out. First, a nested loop was required to iterate for
RQ3: How Students Respond.                                        ‘rotation count x 4’ times. Second, the move statement used
How did data-driven feedback impact students’ active and          ‘length x 2’ as its parameter whereas only ‘length’ would be
idle time while solving a programming problem?                    sufficient. Finally, the variable used in the ‘move’ statement
Figure 2: Student Lime’s Solution when no Feed-
back was given

                                                                   Figure 4: Student Jade’s Solution without Feedback


                                                                   Figure 5: Student Jade’s Solution when Feedback
Figure 3: Student Lime’s Solution when Feedback
                                                                   was Given
was Given

                                                                   from organizational issues. Also, Jade couldn’t figure out
must be incremented at each iteration. The student spent           that the ‘move’ statement should use a variable instead of
18 minutes and 18 seconds before giving up, being unable to        a constant and the same variable needs to be incremented
figure out these issues.                                           at each iteration. The number of repetitions in the repeat
Student Lime with Data-Driven Positive Feedback:                   block was also incorrect.
Later, student Lime attempted the homework again after             Student Jade with Data-Driven Positive Feedback:
receiving notice that the DDPF system was made available.          Like Lime, Jade attempted the homework again when the
When Lime received feedback, they figured out the 3 issues         DDPF system was provided. When given feedback, Jade
and reached a correct solution [Figure 3]. The second objec-       first created a custom block and used it on the stage which
tive suggests that there is a correct number of rotations that     got the first objective marked green. The second objective
is needed to be used within the custom block. With this            hints on using a loop that repeats for correct number of
feedback, Lime used ‘4 x Rotations’ in the ‘repeat’ block          rotations within the custom block. This time Jade imple-
instead of using ‘15’ and completed the second objective.          mented the loop within the block and got the second ob-
The third objective suggests the use of a variable in the          jective correct. Within the loop, Jade used ‘move’, ‘turn’,
‘move’ statement. Lime used an initialized variable ‘length’       and ‘change’ statements and reached the correct solution
in the ‘move’ statement instead of ‘length x 2’ and the objec-     [Figure 5] with all objectives marked green. With adaptive
tive was marked green. Finally, Lime incremented ‘length’          feedback, it took Jade 14 minutes 29 seconds to reach a cor-
within the loop and all objectives were completed and they         rect solution. Whereas without feedback, Jade gave up with
reached a correct solution. With data-driven adaptive feed-        an incorrect solution after spending over 16 minutes on the
back, Lime spent 29 minutes 51 seconds before reaching the         problem.
correct solution. Recall that Lime gave up with an incor-
rect solution after around 18 minutes when no feedback was
given.
                                                                   5.3   Findings
                                                                   The 2 case studies of Lime and Jade presented in this sec-
                                                                   tion demonstrated that the feedback system was able to help
5.2    Case Study Jade                                             students in filling up the gaps in their code to reach a cor-
Student Jade without Data-Driven Positive Feed-                    rect solution. In one case, this achievement came at the
back: Student Jade initially attempted to solve Squiral            cost of a higher active time and in the other case the stu-
without data-driven positive feedback, spent 16 minutes and        dent reached a correct solution in less time when feedback
55 seconds before giving up with an incorrect solution. Jade’s     was provided. Also, we observed most of the students had a
code [Figure 4] contains ‘repeat’, ‘move’, and ‘turn’ state-       working solution [capable of drawing a Squiral] at the end of
ments on the stage. Jade created a custom block and only           their attempts, although not logically 100% correct accord-
used the block to initialize a variable, ‘length’, that was also   ing to researchers. Moreover, almost all (25 out of 27) the
a parameter to the block. The components to complete the           students explored syntactic constructs required to complete
objectives were partially there in Jade’s code but it suffered     the objectives, (e.g. move, turn, iteration, variables), which
potentially indicates that they closely followed the objectives
to accomplish the assignment.

6.    RQ2: WHAT HAPPENS WHEN IT GOES
      WRONG
To answer RQ2, we manually walked through the sequen-
tial code snapshots of each student when they attempted
to solve Squiral and generated case studies where the sys-
tem went wrong and observed students’ problem solving ap-
proach. Below we present three case studies demonstrating
our system’s potential impact on student behavior when the
system could not provide reliable feedback. We selected one       Figure 6: Correct Solution Initially Implemented by
case where the student completed an objective but the sys-        Azure
tem could not detect it (FN case) and two cases where the
system detected completed objectives when the objectives
were not completed (according to researchers), which led
students to an incorrect solution or made them stop early.

6.1    Case Study Azure : FN Cases Causing Stu-
       dents to Work More than Necessary
Student Azure started solving Squiral by creating a custom
block and got the first objective correct. Azure used 2 pa-
rameters, ‘size’ and ‘length’, to denote the number of ro-
tations and length of the first side of the innermost loop.
                                                                  Figure 7: Incorrect Solution Initially Implemented
They created a loop with a ‘repeat’ block with the correct
                                                                  by Student Cyan
number of rotations (‘size x 4’) and got the second objective
correct. As Azure used the ‘length’ parameter in the ‘move’
statement within the loop, they got the third objective cor-
                                                                  parameters in the custom block and used one of them in a
rect. Then Azure added a ‘turn’ statement and incremented
                                                                  ‘move’ statement within a nested ‘repeat’ block that got him
the ‘length’ variable. At this point, the fourth objective was
                                                                  the second and third objective correct. However, Cyan im-
completed, according to researchers. However, the objective
                                                                  plemented another nested loop and added a ‘change’ state-
was undetected by the system [FN case], because Azure used
                                                                  ment within that loop in the stage instead of adding them
a ‘turn’ statement that was different from those used in the
                                                                  to the custom block. The system detected the objective and
previous students’ solutions [that were used to extract and
                                                                  marked the fourth objective green. At this point, the stu-
detect the objectives]. According to researchers, Azure’s so-
                                                                  dent Cyan had all objectives correct [FP cases] but the code
lution was 100% correct at this point [Figure 6]. It took
                                                                  [Figure 7] was unable to draw a Squiral.
this student only 2 minutes 24 seconds to reach the correct
                                                                  Later, Cyan removed the ‘change’ statement from the stage
solution.
                                                                  that caused the fourth objective to be broken. However, re-
However, the fourth objective was not detected by the sys-
                                                                  moving the custom block from the stage was causing other
tem. Azure kept working on their code. Azure made several
                                                                  objectives to be broken. Cyan then moved the ‘change’
changes to their code which led them to an incorrect solu-
                                                                  statement to the custom block and corrected the rotation
tion. Finally, Azure ended up submitting a solution that was
                                                                  count in the ‘repeat’ statement. At this point, the solution
also 100% correct according to researchers, but was slightly
                                                                  was correct and similar to Figure 1a. The incorrectly de-
different from their initial solution. In the submitted solu-
                                                                  tected objectives led Cyan to a non-working solution. In
tion, Azure removed the ‘length’ variable from the parame-
                                                                  this case, Cyan had to ignore the detectors and do extra
ter list of the custom block. The fourth objective was still
                                                                  work to reach a correct solution.
undetected. While doing these changes, the student spent
                                                                  We observed a similar situation in the case of three other
12 minutes 5 seconds more in the system which is almost 5
                                                                  students. One of them got four objectives correct and the
times the amount of time the student spent to get a correct
                                                                  code was able to draw a Squiral, but according to researchers
solution in the first place.
                                                                  the code had one programmatic problem. The code was
We observed a similar situation in case of student Blue who
                                                                  drawing three sides of the Squiral using an inner loop and
worked for a total of 1 hour 43 minutes 16 seconds. Blue
                                                                  one side manually. The problem detected by researchers
reached a correct solution at 1 hour 7 minutes 11 seconds ac-
                                                                  remained undetected in the system and the student ended
cording to researchers. But one objective (fourth objective)
                                                                  up submitting a partially correct solution. For the other
was undetected. Blue kept working for another 36 minutes
                                                                  two students, four objectives got detected when there were
5 sec (almost 50% of the time taken to reach the correct
                                                                  syntactical problems present in their programs, as for stu-
solution at first attempt).
                                                                  dent Cyan. The students realized that completing the ob-
                                                                  jectives did not necessarily mean that their code could draw
6.2    Case Study Cyan : FP Cases Leading Stu-                    a Squiral. Although FP cases led the students to incomplete
       dents to an Incorrect solution                             code with four checked objectives, incorrect output eventu-
Student Cyan created a custom block and used it on the            ally compelled each student to modify their codes and to
stage and got the first objective correct. Cyan used two          reach a 100% correct solution at the end.
                                                                 Figure 9: Phases in a Student’s Attempt to Solve
                                                                 Squiral
 Figure 8: Solution Submitted by Student Indigo

                                                                 Table 2: Active and Idle Time range before Objec-
6.3   Case Study Indigo : FP Cases Causing Stu-                  tives got detected for the first time
                                                                     Objective Active Time (min.) Idle Time(min.)
      dents to Stop Early at a Partially Correct                     1          ∼0-12.45             0-9.27
      Solution                                                       2          0.1-84.5             0-10.1
We found 6 additional cases where the students got 4 objec-          3          0.4-17.9             0-13.25
tives correct, but the code was partially correct (FP Case)          4          0.5-18.2             0-19
according to researchers. However, their programs were able
to draw a Squiral as required. In such cases, the students
finished the attempt early and submitted the partially cor-      feedback, since they did not seem to question the feedback
rect solution. We present the case study of Student Indigo       system as long as they had a working program that drew
here.                                                            the correct shape. However, students did not rely on their
Student Indigo’s solution [Figure 8] had objectives 1, 3, and    own skill to determine when their code was correct. Further-
4 correctly completed according to researchers and the ob-       more, students sometimes even ignored the produced output
jectives were detected by the objective detection system as      that showed they had a working solution, in the cases when
well. Indigo created a custom block and used it in the stage     objectives were not detected.
[required to complete objective 1]. They used ‘pen down’
and added ‘move’, ‘turn’, and ‘change’ statements accord-
ingly [required to complete objective 4] within a nested loop    7.   RQ3: HOW STUDENTS RESPOND
implemented with two ‘repeat’ statements. In the ‘move’          To answer RQ3, we investigated how the correctness of ob-
statement, Indigo used an initialized variable, ‘Length’ [re-    jective detection at early phases of a problem solving at-
quired to complete objective 3], and incremented the value       tempt impacted later phases of the attempt. We divided
of ‘Length’ by 10 at each iteration. However, the rotation       the total time each student spent on the system into three
count used in the nested loop was wrong. One of the ‘repeat’     phases [Figure 9] - a) Phase A : when objectives were de-
statements should have the count of rotations and the other      tected for the first time; b) Phase B : changes in previously
should have a constant 4, indicating the 4 sides of the square   detected objectives were detected. c) Phase C : students
drawn at each rotation. The objective detection system de-       spent time in the system but no change in the objectives
tected the use of the nested loop and marked objective 2         were detected. For example, a student got objectives 1, 3,
green [FP case]. The code was able to draw a Squiral. How-       and 4 marked green within 20 minutes of starting the at-
ever, the implementation was not completely correct. But,        tempt [Phase A]. Then the student continued working for
once the student Indigo got 4 objectives correct, they sub-      another 10 minutes [Phase B] but did not see a new ob-
mitted their solution.                                           jective [objective 2] go green. However, the previously de-
                                                                 tected objectives were broken and corrected several times.
                                                                 Finally, the student spent another 5 minutes [Phase C] when
6.4   Findings                                                   no change in any of the objectives were detected. We tried
We observed cases where students reached a correct solu-         to relate correct, incorrect, early, and late detection ratios
tion, but could not recognize the correct solution because       in phase A with the active and idle time spent in phases A,
their completed objectives were not detected by the sys-         B, and C to understand if correct, incorrect, early, or late
tem. They continued working on the assignment for a longer       detection regulates the time or effort students put on the
time than was necessary. We also observed cases when stu-        assignment.
dents got 4 objectives correct but their solution was not
even working, i.e. their code was not able to draw a Squiral     Active and Idle Time Observed in Phase A: In this
because of organizational or syntactical problems. Only in       phase, objectives got detected for the first time by the sys-
these cases, the students realized completing the objectives     tem. Students were observed to take wide ranges of time
is not enough. Thus, they modified the code overriding the       before an objective was detected [Table 2]. We plotted aver-
objective detectors to reach a working solution. Our third       age active and idle time against correct objective detection
case study showed that students who got 4 objectives correct     ratios and observed that students with higher correct objec-
with a working code submitted their solutions even if their      tive detection ratio have shorter phase A in terms of active
code was not fully correct according to researchers. In these    time [Figure 10a]. For 1 of the students, this phase did not
cases, all the students relied on objective detection and sub-   occur because an objective was never detected in the stu-
mitted partially-correct solutions. These case studies poten-    dent’s code. In this phase, only a few cases were found when
tially indicate students’ high reliance on objective detection   the students had idle time. 16 out of 27 students did not
Figure 10: a) Active Time in Phase A against Cor-                Figure 12: Active Time in Phase C against Early
rect Objective Detection Ratio; b) Idle Time in                  Objective Detection Ratio in Phase A
Phase A against Early Objective Detection Ratio

                                                                 Active and Idle Time Observed in Phase C: This
                                                                 phase C indicates the time when no change was detected in
                                                                 any of the objectives. In this phase C, one of the following
                                                                 scenarios occur : 1) Most of the objectives were detected
                                                                 in earlier phases; the student had a working solution and
                                                                 was making minor modifications without impacting the ob-
                                                                 jectives; or 2) One, more than one, or all of the objectives
                                                                 were undetected and the student was working on the assign-
                                                                 ment but submitted the attempt without another objective
                                                                 being detected. We observed that when the first scenario
                                                                 occurs, students spent only a few minutes in phase C and
                                                                 submitted their code. 18 out of 27 students spent only 0.1-8
Figure 11: a) Active Time in Phase B against Cor-                minutes in the system after scenario 1 occurred: even if the
rect Objective Detection Ratio in Phase A; b) Idle               objective detection was wrong, the student relied on the sys-
Time in Phase B against Early Objective Detection                tem, and submitted their code. We observed a higher early
Ratio in Phase A                                                 detection ratio in phase A led to decreased average active
                                                                 time in phase C [Figure 7]. The remaining 9 students spent
                                                                 12-56 minutes in this phase. Scenario 2 played out for 7 of
have any idle time at all. 7 students had idle time ranging      these 9 students, who all submitted the program with some
from 3-5 minutes. The rest of the 4 students had idle time       incomplete objectives.
ranging from 10 - 24 minutes. We observed that a higher
early detection rate (over 25%) has a decreasing trend in av-    7.1    Findings
erage idle time [Figure 10b]. This may potentially indicate      The results of our analysis showed that the active and idle
that positive feedback can be motivating to students, even       time spent in Phase B and Phase C are associated with the
if it is provided early.                                         quality of detection in Phase A in some cases. We observed
Active and Idle Time Observed in Phase B: For 11                 that correct objective detection in Phase A that led to a
students, phase B did not occur at all, due to either the fact   working solution pushed students to finish their attempt,
that an objective was never detected, or the students sub-       making Phases B and C shorter. Whereas, incorrect objec-
mitted their code after all of the objectives were detected      tive detection in Phase A that led to a non-working solution
for the first time in phase A. 10 students spent >0 - <10        decreased the idle time observed in Phase B and students
minutes, and 6 students spent >10 - 35 minutes in phase          worked more. However, the active time in such cases var-
B. 4 of the 6 students who spent a higher amount of time         ied from student to student, and depended on the extent
in this phase B had a high early detection ratio at phase        to which objective detection went wrong. Our case stud-
A (50-75%), and 1 student had a high incorrect detection         ies presented in Section 6 demonstrated indication of stu-
ratio (50%). These students did not have correct solutions,      dents’ significant reliance on the feedback. This reliance on
even if some or all of the objectives got detected in phase      the system interacted differently depending on whether the
A. In this phase B, 21 out of 27 students had no idle time.      feedback was correct or incorrect, and whether or not the
6 students had idle time ranging from 3 to 24 minutes. As        student code output appeared correct, and these differences
we plotted average active and idle time in phase B against       are reflected in students’ response or effort in terms of active
the correct and incorrect objective detection ratio in phase     and idle time.
A, we observed a higher correct detection rate (>25%) in
phase A seemed to decrease the active time spent in phase
B [Figure 11a]. This means that correct detections helped        8.    DISCUSSION
students complete their programs more quickly. Correct ob-       Our case studies and analysis demonstrated that, although
jective detections in phase A pushed the students towards        our feedback system could guide students to complete a pro-
the end of their attempt. However, incorrect objective de-       gramming assignment, it could mislead students to do ex-
tection in phase A decreased idle time in phase B and caused     tra work, submit a partial correct solution, or to end up at
the students to continue actively working [Figure 11b].          a non-working solution in the event it provides inaccurate
feedback. However, the visual feedback of objectives going        [4] L. Gusukuma, A. C. Bart, D. Kafura, and J. Ernst.
green [even for an incorrect or too-early detection] reduced          Misconception-Driven Feedback: Results from an
idle time, giving us an indication that the feedback was mo-          Experimental Study. Proceedings of the 2018 ACM
tivating for students. All these observed impacts could be            Conference on International Computing Education
the result of students’ high reliance on the feedback system.         Research - ICER ’18, (1):160–168, 2018.
To prevent the negative impacts generated in the event of         [5] S. Marwan, G. Gao, S. Fisk, T. W. Price, and
incorrect detections, mechanisms to prevent incorrect de-             T. Barnes. Adaptive immediate feedback can improve
tections must be explored. We observed our system to fail             novice programming engagement and intention to
in the event of new student behaviours. Since we built in-            persist in computer science. In Proceedings of the
terventions that use data from prior students and new stu-            International Computing Education Research
dents behave in new ways, the system has not had time to              Conference (forthcoming), 2020.
learn and adapt based on their behavior. Thus, any sys-           [6] S. Marwan, J. Jay Williams, and T. Price. An
tem like ours must have an iterative process for integrating          evaluation of the impact of automated programming
new behaviors that may arise from diversity in students and           hints on performance and learning. In Proceedings of
instructors.                                                          the 2019 ACM Conference on International
                                                                      Computing Education Research, pages 61–70, 2019.
                                                                  [7] T. Murray. Authoring intelligent tutoring systems: An
9.    CONCLUSION AND FUTURE WORK                                      analysis of the state of the art. 1999.
This paper presents case studies to provide important in-
sights into the impacts of positive feedback on novice pro-       [8] T. W. Price and T. Barnes. Comparing textual and
grammers from multiple perspectives. We present case stud-            block interfaces in a novice programming environment.
ies that shed light on how the feedback system helped two             In Proceedings of the eleventh annual international
students to complete a programming task who failed to com-            conference on international computing education
plete the task on their first attempts without feedback. While        research, pages 91–99, 2015.
these scenarios highlighted the usefulness of positive feed-      [9] T. W. Price, Y. Dong, and T. Barnes. Generating
back, our case studies surrounding the event when the sys-            data-driven hints for open-ended programming.
tem could not provide accurate feedback provided insights             International Educational Data Mining Society, 2016.
on what impact feedback failures can have on students’ re-       [10] T. W. Price, Y. Dong, and D. Lipovac. isnap: towards
sponses. These insights can be highly useful to decide on             intelligent tutoring in novice programming
measures to mitigate an adverse impact or to formulate                environments. In Proceedings of the 2017 ACM
adaptations to handle unexpected behaviors. This may in-              SIGCSE Technical Symposium on Computer Science
volve expressing a confidence level for detectors, or inviting        Education, pages 483–488, 2017.
students to self-explain how and why their solutions are cor-    [11] T. W. Price, R. Zhi, and T. Barnes. Evaluation of a
rect. The primary contributions of this work are: 1) Case             Data-driven Feedback Algorithm for Open-ended
Studies demonstrating how the positive feedback system can            Programming. In Proceedings of the International
induce a working solutions for a programming task; 2) Case            Conference on Educational Data Mining, 2017.
studies and code trace-based analyses that gave important        [12] T. W. Price, R. Zhi, and T. Barnes. Hint generation
insights on how a data-driven positive feedback system im-            under uncertainty: The effect of hint quality on
pacts students’ behaviour when the system goes wrong. Our             help-seeking behavior. In International Conference on
results show interesting relationships between correctness of         Artificial Intelligence in Education, pages 311–322.
the provided feedback and the time students spent on the              Springer, 2017.
task or in the system. In our future work, we plan to ex-        [13] R. Zhi, S. Marwan, Y. Dong, N. Lytle, T. W. Price,
plore these impacts in larger controlled studies and on other         and T. Barnes. Toward data-driven example feedback
programming tasks, and we also plan to explore how we can             for novice programming.
adapt our system to balance students’ understanding of their     [14] R. Zhi, T. W. Price, N. Lytle, Y. Dong, and
own code with reliance on feedback, to promote learning.              T. Barnes. Reducing the state space of programming
                                                                      problems through data-driven feature detection. In
                                                                      EDM Workshop, 2018.
10.   ACKNOWLEDGMENTS                                            [15] R. Zhi, T. W. Price, S. Marwan, A. Milliken,
This material is based upon work supported by the National
                                                                      T. Barnes, and M. Chi. Exploring the impact of
Science Foundation under grants 1623470.
                                                                      worked examples in a novice programming
                                                                      environment. In Proceedings of the 50th ACM
11.   REFERENCES                                                      Technical Symposium on Computer Science Education,
 [1] A. C. Bart, J. Tibau, E. Tilevich, C. A. Shaffer, and            pages 98–104, 2019.
     D. Kafura. Blockpy: An open access data-science
     environment for introductory programmers.
     Computer, 50(5):18–26, 2017.
 [2] W. Dann, D. Cosgrove, D. Slater, D. Culyba, and
     S. Cooper. Mediated transfer: Alice 3 to java. In
     Proceedings of the 43rd ACM technical symposium on
     Computer Science Education, pages 141–146, 2012.
 [3] D. Garcia, B. Harvey, and T. Barnes. The beauty and
     joy of computing. ACM Inroads, 6(4):71–79, 2015.