=Paper= {{Paper |id=Vol-2734/paper3 |storemode=property |title=Improving Short and Long-term Learning in an Online Homework System |pdfUrl=https://ceur-ws.org/Vol-2734/paper3.pdf |volume=Vol-2734 |authors=Ben Prystawski,Jacob Nogas,Andrew Petersen,Joseph Jay Williams |dblpUrl=https://dblp.org/rec/conf/edm/PrystawskiNPW20 }} ==Improving Short and Long-term Learning in an Online Homework System== https://ceur-ws.org/Vol-2734/paper3.pdf

Improving Short and Long-term Learning in an Online
Homework System

Ben Prystawski Jacob Nogas
University of Toronto University of Toronto
ben.prystawski@mail.utoronto.ca jacob.nogas@mail.utoronto.ca
Andrew Petersen Joseph Jay Williams
University of Toronto University of Toronto
andrew.petersen@utoronto.ca williams@cs.toronto.edu

ABSTRACT
Online homework systems are common in university courses.
While scientific findings about learning could have bearing
on how instructors design these systems, there is little guid-
ance available for instructors on the problem of extrapo-
lating scientific results in various contexts to make design
decisions in specific settings. This paper leverages the value
of online environments to conduct randomized experiments
that directly test principles in a real-world introductory pro-
gramming course. We investigate the relative benefit of giv-
ing students explanations of the correct solution to a prob-
lem and giving them an additional problem. We find sugges-
tive evidence that students do better on subsequent prob-
lems in the same exercise when given an explanation, but
they do better on a post-test two weeks later when given
an additional practice problem. These results can inform
instructors’ decisions in designing online homework.
Figure 1: Screenshot of the problem on which the
student supports were deployed in the PCRS home-
Keywords work system
education; programming; explanation; online homework; ex-
periment; MOOC

While much work in educational data mining has focused
1. INTRODUCTION AND RELATED WORK on extracting and analyzing data students generate as they
Many university courses use online homework systems to
naturally interact with these systems, we diverge from that
give students practice material. These systems enable stu-
trend in this paper by deliberately embedding a random-
dents to conveniently practice their skills and instructors
ized experiment in an online homework environment. We
to automatically grade homework and gather data on stu-
believe that randomized experiments can provide valuable
dent performance. They typically consist of problems with
insight for computing education researchers and practition-
either written or multiple-choice responses for students to
ers because they can directly test the impact of different
complete.
educational interventions.
Past research has investigated the effects of these supports
The ICAP framework [2, 3] provides a theoretical basis for
on student learning. There is experimental evidence, for
thinking about different educational methods by grouping
example, that explanations and practice problems can help
them into levels based on the depth of students’ engagement
student learning under certain circumstances [5, 6, 8]. How-
with the material. The levels are, from most to least engag-
ever, instructors still must solve the considerable problem
ing: interactive, constructive, active, and passive methods.
of how to translate this research to a real-world setting.
Using this framework, one might expect the active learn-
ing approach of solving problems to be more effective for
students’ understanding than the passive approach of read-
ing more explanations. Students must engage with practice
problems at a deeper level then explanations, so practice
problems might produce better learning outcomes. Compar-
ing the effectiveness of these two methods enables us to test
the active-passive boundary within the ICAP framework.
Copyright c 2020 for this paper by its authors. Use permitted under Cre-
ative Commons License Attribution 4.0 International (CC BY 4.0). While the ICAP framework might lead one to predict that
Repeat Chars First Fifty Nested For and Range How Many - Two For Loops How Many - Nested For Loops

Repeat Chars Repeat Chars v2 First Fifty Nested For and Range v2 How Many - Nested For Loops

Figure 3: Names of problems in week 10 (top)
with corresponding problems in week 12 (bottom).
Blue lines indicate correspondence between prob-
lems. Problems with the same name are identi-
cal, while problems ending in “v2” are very similar
but with minor differences such as different variable
names. Learning supports were all deployed on the
problem “Repeat Chars” in week 10.

Figure 2: An additional problem given to some stu- ready have some knowledge about a subject, providing ad-
dents as a follow-up exercise in the online homework ditional explanations instead of other knowledge-reinforcing
system activities can be detrimental to learning [12]. There has also
been considerable research on prompting students to write
their own explanations in a laboratory setting, finding that
an additional problem should be more helpful to learning having students write their own explanations of key course
than an explanation, this could be confounded by the fact concepts can help learning [10, 4].
that the additional problem is optional. If students spend
enough time thinking about and attempting the problem, Similarly, the effects of solving problems on learning can be
it should improve their understanding beyond the improve- varied and complex under different circumstances. For in-
ment they would see from reading an explanation of the stance, there is a large body of research on the design of
solution. However, it is also possible that students will ded- intelligent tutoring systems which automatically determine
icate less time and attention to the additional problem than which practice problems to show learners and in what order
they would to the explanation as trying to solve a problem is to improve their understanding most effectively [1]. How-
a more daunting task than reading an explanation. Further- ever, mathematics and computer science education research
more, there is the variable of time to improvement. Perhaps point to the challenges in assuming additional practice of
students will not see any immediate benefit from trying an problems is always helpful, as sometimes it is a poor use of
additional problem, but doing it will help their learning in students’ time, or leads them them to focus on procedural
the longer term by cementing their understanding of the knowledge, instead of understanding the underlying princi-
concept the problem tests. Will students benefit from addi- ples [7, 9].
tional homework immediately, or will the improvement affect
how well the student remembers that week’s material later? These studies have shown that even in a controlled labo-
Both hypotheses appear plausible. Likewise, one might ex- ratory environment, the effects of these intuitively-helpful
pect that additional explanations will not have a significant interventions vary significantly. Instructors seeking to apply
effect on student learning. They fit into the passive category these findings to their courses face the problem of translat-
of the ICAP framework, which is the lowest level of engage- ing findings in laboratory experiments to design decisions
ment. The explanation is also optional to read, so students about what kind of support to provide in online problems
might ignore it entirely. However, one might also expect that and other educational environments.
reading a well-written explanation of a concept will deepen a
student’s understanding of the concept they are being tested Many counterintuitive effects have been found in prior edu-
on. Furthermore, it could be the case that students forget cation research, so it is essential to empirically study the ef-
the explanation as soon as they finish working on the ex- fects of interventions before recommending them to instruc-
ercise, but it could also improve their understanding over a tors. In this paper, we extend upon past literature about the
longer period of time, similar to how they remember what role of reading explanations and solving problems in learn-
they learned from lectures when completing homework. ing and provide empirical evidence on how these forms of
student support affect learning in a real-world setting. Ul-
There is evidence that providing students with instructional timately, we hypothesize that students will perform better
explanations when they are solving problems can benefit on subsequent tests of their understanding when they are
learning – which is intuitive. However, these explanations shown an additional problem compared to when they are
are not always effective, especially when they merely give shown an explanation. This hypothesis is motivated by the
away the answer rather than help students come to see how active-passive boundary from the ICAP framework.
to solve a problem [8, 11]. For instance, when learners al-
Short-term difference in # attempts
0.2
Short-Term Effect of Explanation 0.2
Long-Term Effect of Explanation

Long-term difference in # attempts
0.1
0.1

0.0

0.0
−0.1

−0.1
−0.2

−0.3 −0.2

Mean Mean = 0.034 Mean = -0.029 Mean = -0.062
−0.4 n ==337
-0.305 Mean
n ==155
-0.098 Mean
n ==156
-0.095 n = 262 n = 104 n = 112
none short long none short long
Explanation Explanation

Figure 4: Effect of giving an explanation on number Figure 5: Effect of giving an explanation on num-
of attempts to get the right answer on subsequent ber of attempts to get the right answer on the
problems in the same exercise. T-test results: none same problem given two weeks later. T-test re-
vs. short: (t(490)=-1.24, p=0.215), none vs. long: sults: none vs. short: (t(364)=0.300, p=0.764),
(t(491)=-1.29, p=0.195), short vs. long: (t(309)=- none vs. long: (t(372)=0.433, p=0.665), short vs.
0.0263, p=0.979) long: (t(214)=0.126, p=0.900)

ber the exact answers by week 12, so these were non-trivial
2. METHODS measures of learning. Figure 3 shows the names of problems
The context for the experiment on explanations and addi- in week 10 and the corresponding problems in the week 12
tional practice problems was the Programming Course Re- follow-up activity. All of these problems were focused on
source System (PCRS) online homework system for the in- analyzing the runtime of Python programs.
troductory computer programming course at the University
of Toronto. This course spans one twelve-week semester and
students are given for-credit online homework exercises each 2.1 Experimental Factors
week. The problem we deployed the supports on is shown We experimentally varied two variables in a factorial exper-
in Figure 1. It asks students to analyze the runtime of a for iment. Each time the student submitted an answer to the
loop in Python. first problem of the exercise, they were randomly assigned
to a condition for the Explanation factor and the Additional
A total of 648 students completed the homework in week Problem factor.
10 of the course. 478 of these students also completed the
optional follow-up exercise in week 12. There were 5 prob- The Explanation factor had three levels: absent (none),
lems in each week. This choice of weeks ensures that there short, or long. The short explanation states, “The third
is considerable delay between the initial intervention and answer is correct because the code inside the for loop takes
subsequent measurement, enabling us to measure long-term constant steps regardless of len(s) and it will be executed
learning. len(s) times.” The long explanation states “Suppose s =
‘cat’. Then, double = double + ch * 2 will be executed 3
In the experiment, after students attempted a homework times because the for loop iterate through each character of
problem in week 10 of the course, we used a factorial de- s (i.e. ‘c’, ‘a’ and ‘t’). Now, suppose s = ‘google’. Then
sign to independently vary two factors: whether an expla- double = double + ch * 2 will be executed 6 times. As you
nation was provided and whether an additional problem was can see, if len(s) doubles, the number of steps also doubles.
provided. The experiment was performed in the context of So, the third answer is correct.”
a multiple-choice problem pertaining to run-time analysis,
shown in Figure 1. Students were given course credit for The Additional Problem factor had two levels: absent (none)
completing the main problem, but did not have any direct or present (one additional problem that was very similar to
incentive to read the explanation or attempt the additional the problem students had attempted in asking them to trace
problem. through a for loop and determine its time complexity). A
screenshot of this problem is shown in Figure 2.1
To measure the impact on learning over a longer time frame,
we designed a post-test with problems that were either iden- To measure how well a student performs on a problem, we
tical to or variants of the problems asked in week 10. We used the number of attempts until the first correct answer.
gave these problems to students two weeks after the experi- This is simply the number of submissions made before the
ment (week 12). Some follow-up problems were identical to 1
the corresponding week 10 problems and others had features These factors were varied in the context of a larger exper-
iment with more factors that will not be described in this
of the problem changed, such as having a loop executing 50 paper in the interest of space. We used weighted randomiza-
times rather than 30. Students were not at ceiling perfor- tion in favour of not showing students additional activities
mance in the post-test, suggesting that they did not remem- to avoid overwhelming them with too many activities
0.2
Short-Term Effect of Additional Problem Long-Term Effect of Additional Problem
0.4
Short-term difference in # attempts

Long-term difference in # attempts
0.1
0.3

0.0 0.2

−0.1 0.1

0.0
−0.2

−0.1
Mean = -0.199 Mean = -0.222 Mean
n ==367
-0.079 Mean
n ==111
0.252
−0.3

n = 486 n = 162 −0.2
none additional problem none additional problem
Additional Problem Additional Problem

Figure 6: Effect of giving an additional problem on Figure 7: Effect of giving an additional problem on
number of attempts to get the right answer on subse- number of attempts to get the right answer on the
quent problems in the same exercise. T-test result: same problem given two weeks later. T-test result:
(t(646)=0.158, p=0.874) (t(476)=-1.602, p=0.110)

student selects all of the correct options and none of the 3.1 Minor improvement on the same problem
incorrect options on the multiple-choice problem. For ex- Students took only slightly fewer attempts to get the prob-
ample, if a student gets the problem correct on their first lem correct in week 12 compared to week 10. While they
try, their number of attempts is 1. If they get the first at- took 2.07 attempts on average to get the answer right in
tempt wrong but the second attempt right, their number of week 10, they took 2.02 attempts to get the answer right on
attempts is 2. week 12, even though they had completed the same problem
just two weeks before. We found little evidence that students
To measure short-term improvement, we took the difference improved between solving a problem in week 10 and solv-
between the number of attempts on problem 1 of the week ing the same problem in week 12 (t(1136)=-0.529, p=0.597).
10 exercise and the average number of attempts for the re- This suggests that students might not have remembered the
maining four problems in that exercise. This number can answer to the problem even when they already solved it two
be negative if students did worse on the remaining problems weeks earlier, meaning testing them on the same problem in
than they did on the problem we deployed the supports on, week 12 appears to be a non-trivial measure of their under-
and the higher the number, the greater the improvement. standing of the same concepts.

To measure improvement on the delayed exercise, we took 3.2 Explanations might have helped in the short
the difference between the number of attempts on problem term
1 of the week 10 exercise and the number of attempts on the
We found no statistically significant difference between stu-
exact same problem when presented in the week 12 follow-
dents who were given explanations and those who were not.
up.
However, the results suggest that when students were given
explanations, they took slightly fewer attempts to get the
right answer in subsequent problems than those who did
3. RESULTS AND DISCUSSION not, regardless of whether they saw a short (t(490)=-1.24,
In this section, we first show a lack of evidence for an im- p=0.215) or long explanation (t(491)=-1.29, p=0.195), as
provement in performance between the problem we added shown in Figure 4. However, the effect of seeing explana-
student support to in week 10 and the same problem given in tions was much smaller in the long term, as the sample
a follow-up exercise in week 12. Next, we analyze the effects means were similar in all three conditions. This is shown
of the Explanation and Additional Problem factors. Our in Figure 5 and suggests that an explanation in a homework
results did not reach the significance threshold of p < 0.05, context might be useful only during that homework session.
and as such they should be interpreted with caution. We This could have happened because the problems tested a
present suggestive evidence that the explanations were help- procedural skill, namely runtime analysis. While reading an
ful on the same homework exercise (t(490)=-1.24, p=0.215), explanation gives students a clear formula they can apply
but not on the follow-up test two weeks later (t(364)=0.300, in subsequent runtime analysis, they might forget that for-
p=0.764). Finally, we show the reverse trend with the Ad- mula when they stop working on their homework and lose
ditional Problem: it was not helpful in the same homework the benefit of the explanation.
exercise (t(646)=0.158, p=0.874) but might have been in the
follow-up exercise two weeks later (t(476)=-1.602, p=0.110).
We interpret how these results can inform instructors’ de- 3.3 Additional Problems might have helped in
sign choices and address possible limitations of the work. the long term
Similarly to the Explanation factor, we did not find statis-
tically significant evidence for a difference in means for the
Additional Problem factor. However, we still found sugges- problems helps them develop lasting procedural knowledge
tive evidence that giving an additional problem has an effect of how to analyze the runtime of an algorithm, it is not
in the long term. We did not find evidence for a difference clear that we can conclude the same about different tasks
between the performance of students who were shown an in computer science education like learning the syntax of a
additional problem and those who were not on subsequent programming language or how to design an algorithm.
problems in the same homework exercise, but students who
received the additional problem took fewer attempts in the Finally, the week 12 post-test was optional, so dropout is
post-test (t(476)=-1.602, p=0.110). These results are shown a concern. While 648 students completed the exercise in
in Figures 6 and 7 respectively.2 This difference might sug- week 10, only 478 completed the follow-up post-test in week
gest that the value of the additional practice problem was 12. Therefore, the conclusions we draw about the effects
primarily as a memory aid. Doing more problems could have of educational supports in the long term might reflect only
helped students remember the skill they learned better when the population of students who choose to complete the post-
writing the post-test. If this knowledge is already in their test. Though this was the majority of students, the reported
minds when they are doing the exercise, it makes sense that effect could be different if, for instance, the students who are
they did not benefit immediately from more practice. How- unlikely to do optional homework problems in week 12 are
ever, they might remember more when writing the post-test, also unlikely to attempt an optional problem given to them
which would explain the improvement in performance there. in their week 10 homework exercise.

3.4 Limitations 4. CONCLUSION AND FUTURE WORK
A notable limitation of this work was the lack of statistical Our experiment investigated the effects of explanations and
signficance. However, the results are consistent with each additional problems on performance both on a post-test and
other and align with ideas from the ICAP framework. As subsequent problems on the same test. We found intrigu-
such, they suggest a trend that could inform future research. ing but not definitive insight into the effects of explanations.
In the interest of open and replicable science, it can be valu- The mean number of attempts for students who saw either a
able to publish suggestive and negative results that do not short or long explanation was lower than those who saw no
meet the threshold for statistical significance. Real-world explanation, but this difference was not statistically signifi-
data is often messy and suggestive results can reveal crucial cant (t(490)=-1.24, p=0.215). It is possible that the expla-
new directions for analysis. nations we showed students simply did not have an effect on
their learning in either the short or long term. It could be
Another limitation of this work is that the problems in the that the explanations used in this experiment did not bene-
week 12 follow-up were not all identical to those in the week fit students as much as they could have and effort should be
10 homework. They tested the same concepts and some were directed to designing better explanations. Alternatively, it
exact copies, but others were slight modifications of prob- is possible that the explanations helped students somewhat
lems on the original homework. Therefore, the observed re- on the remaining problems in the homework exercise. If this
sults might be due to the supports having different degrees result were replicated in a larger study, it would be inter-
of relevance to the problems in week 10 and week 12 rather esting because it could guide instructors in deciding how to
than the duration between support and post-test. We have effectively incorporate instructional explanation into their
mitigated this by using the differences between number of courses.
attempts on the problem we applied the explanations and
additional problem to and the relevant subsequent problems In exploring the effect of additional problems, we found that
as dependent variables, so if one intervention improves stu- the mean number of attempts on the equivalent post-test
dents’ score on problem 1 both in week 10 and week 12, that problem was lower for students who were shown an addi-
would be reflected in that the changes to both scores cancel tional problem than those who were not. This difference
out when the difference is computed. was not statistically significant, though we found stronger
evidence for it than we found for explanations (t(476)=-
One might also raise the concern that we had different sam- 1.602, p=0.110). Like with the explanations, it is possible
ple sizes in different conditions. More students were assigned that the additional problem we gave students was truly not
to the “none” condition than other conditions for both the effective and future work should focus on how to design more
Explanation and Additional Problem factors. We intention- effective practice problems. However, if the long-term im-
ally weighted the randomization in this way to minimize the provement as a result of the additional problem is replicated
burden on students from having too many additional ac- in subsequent large-scale experiments, it could provide guid-
tivities, a strategy used in randomized clinical trials in the ance for instructors in deciding how to incorporate practice
medical field. problems into their courses effectively.

Considering that the effects of reading explanations and If the results reported above reflect a real effect, they sug-
solving problems might vary widely with context, such as gest that explanations are helpful in the short term, but not
the week of a course in which supports were given, it is un- in the long term. Conversely, additional problems are help-
clear how broadly the trends we identify in our data apply. ful in the long term, but not in the short term. This aligns
While it appears possible that giving students more practice with what one might expect based on the ICAP framework,
2
After this analysis, we noticed that the control and exper- as solving a problem qualifies as deeper engagement with
imental groups had different variances, which violates the the learning material than reading an explanation. Instruc-
assumption of the standard t-test. We then ran Welch’s t- tors likely care more about whether their students retain
test and found a p-value of 0.07. (t(476)=-1.813, p=0.0712) information in the long term rather than whether they un-
derstand concepts immediately, so focusing on the long-term 5. REFERENCES
learning measure makes sense. [1] C. J. Butz, S. Hua, and R. B. Maguire. A web-based
bayesian intelligent tutoring system for computer
The possible difference between the effects of these inter- programming. Web Intelligence and Agent Systems:
ventions is interesting and motivates further research into An International Journal, 4(1):77–97, 2006.
how immediate and delayed effects of reading explanations [2] M. T. Chi. Active-constructive-interactive: A
and solving problems might differ. This might help guide conceptual framework for differentiating learning
instructors in thinking about the trade-offs involved in de- activities. Topics in cognitive science, 1(1):73–105,
ciding when to give explanations to students and when to 2009.
give them more problems. [3] M. T. Chi and R. Wylie. The icap framework: Linking
cognitive engagement to active learning outcomes.
Future work should investigate how generally this pattern Educational psychologist, 49(4):219–243, 2014.
holds. The part of the course we deployed these supports [4] J. L. Chiu and M. T. Chi. Supporting self-explanation
on focused on the procedural skill of reading an algorithm in the classroom. Applying science of learning in
and analyzing its time complexity. Would giving an addi- education: Infusing psychological science into the
tional problem still be effective in teaching a different con- curriculum, pages 91–103, 2014.
cept in the course, such as the difference between for and [5] M. Feng, N. T. Heffernan, and J. E. Beck. Using
while loops? Perhaps additional problems are more helpful learning decomposition to analyze instructional
in developing procedural knowledge while good explanations effectiveness in the assistment system. In AIED, 2009.
might be more effective in building propositional knowledge.
[6] R. Hosseini, T. Sirkiä, J. Guerra, P. Brusilovsky, and
By running similar experiments at different points in the in-
L. Malmi. Animated examples as practice content in a
troductory computer science course, we hope to learn more
java programming course. In Proceedings of the 47th
about which types of student support are helpful in devel-
ACM Technical Symposium on Computing Science
oping different skills.
Education, SIGCSE ’16, page 540–545, New York, NY,
USA, 2016. Association for Computing Machinery.
Additionally, we are interested in investigating whether the
effects of these interventions differ across subgroups of stu- [7] J. Kay, M. Barg, A. Fekete, T. Greening, O. Hollands,
dents. One reason why we might not see a large average J. H. Kingston, and K. Crawford. Problem-based
effect is that the effectiveness of different forms of support learning for foundation computer science courses.
could vary significantly across students. Perhaps, for exam- Computer Science Education, 10(2):109–128, 2000.
ple, students who take a programming course out of intrinsic [8] D. S. McNamara, T. O’Riley, and R. S. Taylor.
interest are more likely to benefit from an additional prac- Classroom based reading strategy training:
tice problem than those who take it to satisfy a breadth re- Self-explanation vs. a reading control. In Proceedings
quirement. By analyzing this experimental data jointly with of the Annual Meeting of the Cognitive Science
contextual variables derived from surveys and data mining, Society, volume 28, 2006.
we hope to provide a richer picture of which forms of sup- [9] N. Rummel, M. Mavrikis, M. Wiedmann, K. Loibl,
port work for which students and how instructors can tailor C. Mazziotti, W. Holmes, and A. Hansen. Combining
interventions more precisely to individual students’ needs. exploratory learning with structured practice to foster
conceptual and procedural fractions knowledge.
Singapore: International Society of the Learning
Sciences, 2016.
[10] J. J. Williams and T. Lombrozo. The role of
explanation in discovery and generalization: Evidence
from category learning. Cognitive Science,
34(5):776–806, 2010.
[11] J. Wittwer, M. Nückles, and A. Renkl. Improving
human tutoring by improving tutor-generated
explanations. In Avoiding Simplicity, Confronting
Complexity, pages 359–368. Brill Sense, 2006.
[12] J. Wittwer and A. Renkl. Why instructional
explanations often do not work: A framework for
understanding the effectiveness of instructional
explanations. Educational Psychologist, 43(1):49–64,
2008.