Improving Short and Long-term Learning in an Online
                       Homework System

                             Ben Prystawski                                                Jacob Nogas
                           University of Toronto                                         University of Toronto
               ben.prystawski@mail.utoronto.ca                                  jacob.nogas@mail.utoronto.ca
                      Andrew Petersen                                                Joseph Jay Williams
                           University of Toronto                                         University of Toronto
                 andrew.petersen@utoronto.ca                                        williams@cs.toronto.edu

ABSTRACT
Online homework systems are common in university courses.
While scientific findings about learning could have bearing
on how instructors design these systems, there is little guid-
ance available for instructors on the problem of extrapo-
lating scientific results in various contexts to make design
decisions in specific settings. This paper leverages the value
of online environments to conduct randomized experiments
that directly test principles in a real-world introductory pro-
gramming course. We investigate the relative benefit of giv-
ing students explanations of the correct solution to a prob-
lem and giving them an additional problem. We find sugges-
tive evidence that students do better on subsequent prob-
lems in the same exercise when given an explanation, but
they do better on a post-test two weeks later when given
an additional practice problem. These results can inform
instructors’ decisions in designing online homework.
                                                                           Figure 1: Screenshot of the problem on which the
                                                                           student supports were deployed in the PCRS home-
Keywords                                                                   work system
education; programming; explanation; online homework; ex-
periment; MOOC

                                                                           While much work in educational data mining has focused
1.    INTRODUCTION AND RELATED WORK                                        on extracting and analyzing data students generate as they
Many university courses use online homework systems to
                                                                           naturally interact with these systems, we diverge from that
give students practice material. These systems enable stu-
                                                                           trend in this paper by deliberately embedding a random-
dents to conveniently practice their skills and instructors
                                                                           ized experiment in an online homework environment. We
to automatically grade homework and gather data on stu-
                                                                           believe that randomized experiments can provide valuable
dent performance. They typically consist of problems with
                                                                           insight for computing education researchers and practition-
either written or multiple-choice responses for students to
                                                                           ers because they can directly test the impact of different
complete.
                                                                           educational interventions.
Past research has investigated the effects of these supports
                                                                           The ICAP framework [2, 3] provides a theoretical basis for
on student learning. There is experimental evidence, for
                                                                           thinking about different educational methods by grouping
example, that explanations and practice problems can help
                                                                           them into levels based on the depth of students’ engagement
student learning under certain circumstances [5, 6, 8]. How-
                                                                           with the material. The levels are, from most to least engag-
ever, instructors still must solve the considerable problem
                                                                           ing: interactive, constructive, active, and passive methods.
of how to translate this research to a real-world setting.
                                                                           Using this framework, one might expect the active learn-
                                                                           ing approach of solving problems to be more effective for
                                                                           students’ understanding than the passive approach of read-
                                                                           ing more explanations. Students must engage with practice
                                                                           problems at a deeper level then explanations, so practice
                                                                           problems might produce better learning outcomes. Compar-
                                                                           ing the effectiveness of these two methods enables us to test
                                                                           the active-passive boundary within the ICAP framework.
Copyright c 2020 for this paper by its authors. Use permitted under Cre-
ative Commons License Attribution 4.0 International (CC BY 4.0).           While the ICAP framework might lead one to predict that
                                                                 Repeat Chars   First Fifty   Nested For and Range     How Many - Two For Loops   How Many - Nested For Loops


                                                                 Repeat Chars   Repeat Chars v2     First Fifty      Nested For and Range v2      How Many - Nested For Loops


                                                                   Figure 3: Names of problems in week 10 (top)
                                                                   with corresponding problems in week 12 (bottom).
                                                                   Blue lines indicate correspondence between prob-
                                                                   lems. Problems with the same name are identi-
                                                                   cal, while problems ending in “v2” are very similar
                                                                   but with minor differences such as different variable
                                                                   names. Learning supports were all deployed on the
                                                                   problem “Repeat Chars” in week 10.


Figure 2: An additional problem given to some stu-                 ready have some knowledge about a subject, providing ad-
dents as a follow-up exercise in the online homework               ditional explanations instead of other knowledge-reinforcing
system                                                             activities can be detrimental to learning [12]. There has also
                                                                   been considerable research on prompting students to write
                                                                   their own explanations in a laboratory setting, finding that
an additional problem should be more helpful to learning           having students write their own explanations of key course
than an explanation, this could be confounded by the fact          concepts can help learning [10, 4].
that the additional problem is optional. If students spend
enough time thinking about and attempting the problem,             Similarly, the effects of solving problems on learning can be
it should improve their understanding beyond the improve-          varied and complex under different circumstances. For in-
ment they would see from reading an explanation of the             stance, there is a large body of research on the design of
solution. However, it is also possible that students will ded-     intelligent tutoring systems which automatically determine
icate less time and attention to the additional problem than       which practice problems to show learners and in what order
they would to the explanation as trying to solve a problem is      to improve their understanding most effectively [1]. How-
a more daunting task than reading an explanation. Further-         ever, mathematics and computer science education research
more, there is the variable of time to improvement. Perhaps        point to the challenges in assuming additional practice of
students will not see any immediate benefit from trying an         problems is always helpful, as sometimes it is a poor use of
additional problem, but doing it will help their learning in       students’ time, or leads them them to focus on procedural
the longer term by cementing their understanding of the            knowledge, instead of understanding the underlying princi-
concept the problem tests. Will students benefit from addi-        ples [7, 9].
tional homework immediately, or will the improvement affect
how well the student remembers that week’s material later?         These studies have shown that even in a controlled labo-
Both hypotheses appear plausible. Likewise, one might ex-          ratory environment, the effects of these intuitively-helpful
pect that additional explanations will not have a significant      interventions vary significantly. Instructors seeking to apply
effect on student learning. They fit into the passive category     these findings to their courses face the problem of translat-
of the ICAP framework, which is the lowest level of engage-        ing findings in laboratory experiments to design decisions
ment. The explanation is also optional to read, so students        about what kind of support to provide in online problems
might ignore it entirely. However, one might also expect that      and other educational environments.
reading a well-written explanation of a concept will deepen a
student’s understanding of the concept they are being tested       Many counterintuitive effects have been found in prior edu-
on. Furthermore, it could be the case that students forget         cation research, so it is essential to empirically study the ef-
the explanation as soon as they finish working on the ex-          fects of interventions before recommending them to instruc-
ercise, but it could also improve their understanding over a       tors. In this paper, we extend upon past literature about the
longer period of time, similar to how they remember what           role of reading explanations and solving problems in learn-
they learned from lectures when completing homework.               ing and provide empirical evidence on how these forms of
                                                                   student support affect learning in a real-world setting. Ul-
There is evidence that providing students with instructional       timately, we hypothesize that students will perform better
explanations when they are solving problems can benefit            on subsequent tests of their understanding when they are
learning – which is intuitive. However, these explanations         shown an additional problem compared to when they are
are not always effective, especially when they merely give         shown an explanation. This hypothesis is motivated by the
away the answer rather than help students come to see how          active-passive boundary from the ICAP framework.
to solve a problem [8, 11]. For instance, when learners al-
Short-term difference in # attempts
                                       0.2
                                                         Short-Term Effect of Explanation                                                  0.2
                                                                                                                                                             Long-Term Effect of Explanation


                                                                                                     Long-term difference in # attempts
                                       0.1
                                                                                                                                           0.1

                                       0.0

                                                                                                                                           0.0
                                      −0.1


                                                                                                                                          −0.1
                                      −0.2


                                      −0.3                                                                                                −0.2

                                              Mean                                                                                                Mean = 0.034       Mean = -0.029      Mean = -0.062
                                      −0.4      n ==337
                                                     -0.305      Mean
                                                                   n ==155
                                                                        -0.098       Mean
                                                                                       n ==156
                                                                                            -0.095                                                  n = 262            n = 104            n = 112
                                                  none                short                 long                                                      none                short                long
                                                                   Explanation                                                                                         Explanation


Figure 4: Effect of giving an explanation on number                                                  Figure 5: Effect of giving an explanation on num-
of attempts to get the right answer on subsequent                                                    ber of attempts to get the right answer on the
problems in the same exercise. T-test results: none                                                  same problem given two weeks later. T-test re-
vs. short: (t(490)=-1.24, p=0.215), none vs. long:                                                   sults: none vs. short: (t(364)=0.300, p=0.764),
(t(491)=-1.29, p=0.195), short vs. long: (t(309)=-                                                   none vs. long: (t(372)=0.433, p=0.665), short vs.
0.0263, p=0.979)                                                                                     long: (t(214)=0.126, p=0.900)


                                                                                                     ber the exact answers by week 12, so these were non-trivial
2.                                           METHODS                                                 measures of learning. Figure 3 shows the names of problems
The context for the experiment on explanations and addi-                                             in week 10 and the corresponding problems in the week 12
tional practice problems was the Programming Course Re-                                              follow-up activity. All of these problems were focused on
source System (PCRS) online homework system for the in-                                              analyzing the runtime of Python programs.
troductory computer programming course at the University
of Toronto. This course spans one twelve-week semester and
students are given for-credit online homework exercises each                                         2.1                                         Experimental Factors
week. The problem we deployed the supports on is shown                                               We experimentally varied two variables in a factorial exper-
in Figure 1. It asks students to analyze the runtime of a for                                        iment. Each time the student submitted an answer to the
loop in Python.                                                                                      first problem of the exercise, they were randomly assigned
                                                                                                     to a condition for the Explanation factor and the Additional
A total of 648 students completed the homework in week                                               Problem factor.
10 of the course. 478 of these students also completed the
optional follow-up exercise in week 12. There were 5 prob-                                           The Explanation factor had three levels: absent (none),
lems in each week. This choice of weeks ensures that there                                           short, or long. The short explanation states, “The third
is considerable delay between the initial intervention and                                           answer is correct because the code inside the for loop takes
subsequent measurement, enabling us to measure long-term                                             constant steps regardless of len(s) and it will be executed
learning.                                                                                            len(s) times.” The long explanation states “Suppose s =
                                                                                                     ‘cat’. Then, double = double + ch * 2 will be executed 3
In the experiment, after students attempted a homework                                               times because the for loop iterate through each character of
problem in week 10 of the course, we used a factorial de-                                            s (i.e. ‘c’, ‘a’ and ‘t’). Now, suppose s = ‘google’. Then
sign to independently vary two factors: whether an expla-                                            double = double + ch * 2 will be executed 6 times. As you
nation was provided and whether an additional problem was                                            can see, if len(s) doubles, the number of steps also doubles.
provided. The experiment was performed in the context of                                             So, the third answer is correct.”
a multiple-choice problem pertaining to run-time analysis,
shown in Figure 1. Students were given course credit for                                             The Additional Problem factor had two levels: absent (none)
completing the main problem, but did not have any direct                                             or present (one additional problem that was very similar to
incentive to read the explanation or attempt the additional                                          the problem students had attempted in asking them to trace
problem.                                                                                             through a for loop and determine its time complexity). A
                                                                                                     screenshot of this problem is shown in Figure 2.1
To measure the impact on learning over a longer time frame,
we designed a post-test with problems that were either iden-                                         To measure how well a student performs on a problem, we
tical to or variants of the problems asked in week 10. We                                            used the number of attempts until the first correct answer.
gave these problems to students two weeks after the experi-                                          This is simply the number of submissions made before the
ment (week 12). Some follow-up problems were identical to                                            1
the corresponding week 10 problems and others had features                                             These factors were varied in the context of a larger exper-
                                                                                                     iment with more factors that will not be described in this
of the problem changed, such as having a loop executing 50                                           paper in the interest of space. We used weighted randomiza-
times rather than 30. Students were not at ceiling perfor-                                           tion in favour of not showing students additional activities
mance in the post-test, suggesting that they did not remem-                                          to avoid overwhelming them with too many activities
                                      0.2
                                                  Short-Term Effect of Additional Problem                                                           Long-Term Effect of Additional Problem
                                                                                                                                        0.4
Short-term difference in # attempts


                                                                                                   Long-term difference in # attempts
                                      0.1
                                                                                                                                        0.3


                                      0.0                                                                                               0.2


                                  −0.1                                                                                                  0.1


                                                                                                                                        0.0
                                  −0.2

                                                                                                                                    −0.1
                                                Mean = -0.199                  Mean = -0.222                                                      Mean
                                                                                                                                                    n ==367
                                                                                                                                                         -0.079                 Mean
                                                                                                                                                                                  n ==111
                                                                                                                                                                                       0.252
                                  −0.3

                                                  n = 486                        n = 162                                            −0.2
                                                    none                      additional problem                                                      none                     additional problem
                                                             Additional Problem                                                                               Additional Problem

 Figure 6: Effect of giving an additional problem on                                                Figure 7: Effect of giving an additional problem on
 number of attempts to get the right answer on subse-                                               number of attempts to get the right answer on the
 quent problems in the same exercise. T-test result:                                                same problem given two weeks later. T-test result:
 (t(646)=0.158, p=0.874)                                                                            (t(476)=-1.602, p=0.110)


 student selects all of the correct options and none of the                                         3.1                                       Minor improvement on the same problem
 incorrect options on the multiple-choice problem. For ex-                                          Students took only slightly fewer attempts to get the prob-
 ample, if a student gets the problem correct on their first                                        lem correct in week 12 compared to week 10. While they
 try, their number of attempts is 1. If they get the first at-                                      took 2.07 attempts on average to get the answer right in
 tempt wrong but the second attempt right, their number of                                          week 10, they took 2.02 attempts to get the answer right on
 attempts is 2.                                                                                     week 12, even though they had completed the same problem
                                                                                                    just two weeks before. We found little evidence that students
 To measure short-term improvement, we took the difference                                          improved between solving a problem in week 10 and solv-
 between the number of attempts on problem 1 of the week                                            ing the same problem in week 12 (t(1136)=-0.529, p=0.597).
 10 exercise and the average number of attempts for the re-                                         This suggests that students might not have remembered the
 maining four problems in that exercise. This number can                                            answer to the problem even when they already solved it two
 be negative if students did worse on the remaining problems                                        weeks earlier, meaning testing them on the same problem in
 than they did on the problem we deployed the supports on,                                          week 12 appears to be a non-trivial measure of their under-
 and the higher the number, the greater the improvement.                                            standing of the same concepts.

 To measure improvement on the delayed exercise, we took                                            3.2                                       Explanations might have helped in the short
 the difference between the number of attempts on problem                                                                                     term
 1 of the week 10 exercise and the number of attempts on the
                                                                                                    We found no statistically significant difference between stu-
 exact same problem when presented in the week 12 follow-
                                                                                                    dents who were given explanations and those who were not.
 up.
                                                                                                    However, the results suggest that when students were given
                                                                                                    explanations, they took slightly fewer attempts to get the
                                                                                                    right answer in subsequent problems than those who did
 3.                                         RESULTS AND DISCUSSION                                  not, regardless of whether they saw a short (t(490)=-1.24,
 In this section, we first show a lack of evidence for an im-                                       p=0.215) or long explanation (t(491)=-1.29, p=0.195), as
 provement in performance between the problem we added                                              shown in Figure 4. However, the effect of seeing explana-
 student support to in week 10 and the same problem given in                                        tions was much smaller in the long term, as the sample
 a follow-up exercise in week 12. Next, we analyze the effects                                      means were similar in all three conditions. This is shown
 of the Explanation and Additional Problem factors. Our                                             in Figure 5 and suggests that an explanation in a homework
 results did not reach the significance threshold of p < 0.05,                                      context might be useful only during that homework session.
 and as such they should be interpreted with caution. We                                            This could have happened because the problems tested a
 present suggestive evidence that the explanations were help-                                       procedural skill, namely runtime analysis. While reading an
 ful on the same homework exercise (t(490)=-1.24, p=0.215),                                         explanation gives students a clear formula they can apply
 but not on the follow-up test two weeks later (t(364)=0.300,                                       in subsequent runtime analysis, they might forget that for-
 p=0.764). Finally, we show the reverse trend with the Ad-                                          mula when they stop working on their homework and lose
 ditional Problem: it was not helpful in the same homework                                          the benefit of the explanation.
 exercise (t(646)=0.158, p=0.874) but might have been in the
 follow-up exercise two weeks later (t(476)=-1.602, p=0.110).
 We interpret how these results can inform instructors’ de-                                         3.3                                       Additional Problems might have helped in
 sign choices and address possible limitations of the work.                                                                                   the long term
                                                                                                    Similarly to the Explanation factor, we did not find statis-
                                                                                                    tically significant evidence for a difference in means for the
Additional Problem factor. However, we still found sugges-        problems helps them develop lasting procedural knowledge
tive evidence that giving an additional problem has an effect     of how to analyze the runtime of an algorithm, it is not
in the long term. We did not find evidence for a difference       clear that we can conclude the same about different tasks
between the performance of students who were shown an             in computer science education like learning the syntax of a
additional problem and those who were not on subsequent           programming language or how to design an algorithm.
problems in the same homework exercise, but students who
received the additional problem took fewer attempts in the        Finally, the week 12 post-test was optional, so dropout is
post-test (t(476)=-1.602, p=0.110). These results are shown       a concern. While 648 students completed the exercise in
in Figures 6 and 7 respectively.2 This difference might sug-      week 10, only 478 completed the follow-up post-test in week
gest that the value of the additional practice problem was        12. Therefore, the conclusions we draw about the effects
primarily as a memory aid. Doing more problems could have         of educational supports in the long term might reflect only
helped students remember the skill they learned better when       the population of students who choose to complete the post-
writing the post-test. If this knowledge is already in their      test. Though this was the majority of students, the reported
minds when they are doing the exercise, it makes sense that       effect could be different if, for instance, the students who are
they did not benefit immediately from more practice. How-         unlikely to do optional homework problems in week 12 are
ever, they might remember more when writing the post-test,        also unlikely to attempt an optional problem given to them
which would explain the improvement in performance there.         in their week 10 homework exercise.

3.4    Limitations                                                4.   CONCLUSION AND FUTURE WORK
A notable limitation of this work was the lack of statistical     Our experiment investigated the effects of explanations and
signficance. However, the results are consistent with each        additional problems on performance both on a post-test and
other and align with ideas from the ICAP framework. As            subsequent problems on the same test. We found intrigu-
such, they suggest a trend that could inform future research.     ing but not definitive insight into the effects of explanations.
In the interest of open and replicable science, it can be valu-   The mean number of attempts for students who saw either a
able to publish suggestive and negative results that do not       short or long explanation was lower than those who saw no
meet the threshold for statistical significance. Real-world       explanation, but this difference was not statistically signifi-
data is often messy and suggestive results can reveal crucial     cant (t(490)=-1.24, p=0.215). It is possible that the expla-
new directions for analysis.                                      nations we showed students simply did not have an effect on
                                                                  their learning in either the short or long term. It could be
Another limitation of this work is that the problems in the       that the explanations used in this experiment did not bene-
week 12 follow-up were not all identical to those in the week     fit students as much as they could have and effort should be
10 homework. They tested the same concepts and some were          directed to designing better explanations. Alternatively, it
exact copies, but others were slight modifications of prob-       is possible that the explanations helped students somewhat
lems on the original homework. Therefore, the observed re-        on the remaining problems in the homework exercise. If this
sults might be due to the supports having different degrees       result were replicated in a larger study, it would be inter-
of relevance to the problems in week 10 and week 12 rather        esting because it could guide instructors in deciding how to
than the duration between support and post-test. We have          effectively incorporate instructional explanation into their
mitigated this by using the differences between number of         courses.
attempts on the problem we applied the explanations and
additional problem to and the relevant subsequent problems        In exploring the effect of additional problems, we found that
as dependent variables, so if one intervention improves stu-      the mean number of attempts on the equivalent post-test
dents’ score on problem 1 both in week 10 and week 12, that       problem was lower for students who were shown an addi-
would be reflected in that the changes to both scores cancel      tional problem than those who were not. This difference
out when the difference is computed.                              was not statistically significant, though we found stronger
                                                                  evidence for it than we found for explanations (t(476)=-
One might also raise the concern that we had different sam-       1.602, p=0.110). Like with the explanations, it is possible
ple sizes in different conditions. More students were assigned    that the additional problem we gave students was truly not
to the “none” condition than other conditions for both the        effective and future work should focus on how to design more
Explanation and Additional Problem factors. We intention-         effective practice problems. However, if the long-term im-
ally weighted the randomization in this way to minimize the       provement as a result of the additional problem is replicated
burden on students from having too many additional ac-            in subsequent large-scale experiments, it could provide guid-
tivities, a strategy used in randomized clinical trials in the    ance for instructors in deciding how to incorporate practice
medical field.                                                    problems into their courses effectively.

Considering that the effects of reading explanations and          If the results reported above reflect a real effect, they sug-
solving problems might vary widely with context, such as          gest that explanations are helpful in the short term, but not
the week of a course in which supports were given, it is un-      in the long term. Conversely, additional problems are help-
clear how broadly the trends we identify in our data apply.       ful in the long term, but not in the short term. This aligns
While it appears possible that giving students more practice      with what one might expect based on the ICAP framework,
2
  After this analysis, we noticed that the control and exper-     as solving a problem qualifies as deeper engagement with
imental groups had different variances, which violates the        the learning material than reading an explanation. Instruc-
assumption of the standard t-test. We then ran Welch’s t-         tors likely care more about whether their students retain
test and found a p-value of 0.07. (t(476)=-1.813, p=0.0712)       information in the long term rather than whether they un-
derstand concepts immediately, so focusing on the long-term      5.   REFERENCES
learning measure makes sense.                                     [1] C. J. Butz, S. Hua, and R. B. Maguire. A web-based
                                                                      bayesian intelligent tutoring system for computer
The possible difference between the effects of these inter-           programming. Web Intelligence and Agent Systems:
ventions is interesting and motivates further research into           An International Journal, 4(1):77–97, 2006.
how immediate and delayed effects of reading explanations         [2] M. T. Chi. Active-constructive-interactive: A
and solving problems might differ. This might help guide              conceptual framework for differentiating learning
instructors in thinking about the trade-offs involved in de-          activities. Topics in cognitive science, 1(1):73–105,
ciding when to give explanations to students and when to              2009.
give them more problems.                                          [3] M. T. Chi and R. Wylie. The icap framework: Linking
                                                                      cognitive engagement to active learning outcomes.
Future work should investigate how generally this pattern             Educational psychologist, 49(4):219–243, 2014.
holds. The part of the course we deployed these supports          [4] J. L. Chiu and M. T. Chi. Supporting self-explanation
on focused on the procedural skill of reading an algorithm            in the classroom. Applying science of learning in
and analyzing its time complexity. Would giving an addi-              education: Infusing psychological science into the
tional problem still be effective in teaching a different con-        curriculum, pages 91–103, 2014.
cept in the course, such as the difference between for and        [5] M. Feng, N. T. Heffernan, and J. E. Beck. Using
while loops? Perhaps additional problems are more helpful             learning decomposition to analyze instructional
in developing procedural knowledge while good explanations            effectiveness in the assistment system. In AIED, 2009.
might be more effective in building propositional knowledge.
                                                                  [6] R. Hosseini, T. Sirkiä, J. Guerra, P. Brusilovsky, and
By running similar experiments at different points in the in-
                                                                      L. Malmi. Animated examples as practice content in a
troductory computer science course, we hope to learn more
                                                                      java programming course. In Proceedings of the 47th
about which types of student support are helpful in devel-
                                                                      ACM Technical Symposium on Computing Science
oping different skills.
                                                                      Education, SIGCSE ’16, page 540–545, New York, NY,
                                                                      USA, 2016. Association for Computing Machinery.
Additionally, we are interested in investigating whether the
effects of these interventions differ across subgroups of stu-    [7] J. Kay, M. Barg, A. Fekete, T. Greening, O. Hollands,
dents. One reason why we might not see a large average                J. H. Kingston, and K. Crawford. Problem-based
effect is that the effectiveness of different forms of support        learning for foundation computer science courses.
could vary significantly across students. Perhaps, for exam-          Computer Science Education, 10(2):109–128, 2000.
ple, students who take a programming course out of intrinsic      [8] D. S. McNamara, T. O’Riley, and R. S. Taylor.
interest are more likely to benefit from an additional prac-          Classroom based reading strategy training:
tice problem than those who take it to satisfy a breadth re-          Self-explanation vs. a reading control. In Proceedings
quirement. By analyzing this experimental data jointly with           of the Annual Meeting of the Cognitive Science
contextual variables derived from surveys and data mining,            Society, volume 28, 2006.
we hope to provide a richer picture of which forms of sup-        [9] N. Rummel, M. Mavrikis, M. Wiedmann, K. Loibl,
port work for which students and how instructors can tailor           C. Mazziotti, W. Holmes, and A. Hansen. Combining
interventions more precisely to individual students’ needs.           exploratory learning with structured practice to foster
                                                                      conceptual and procedural fractions knowledge.
                                                                      Singapore: International Society of the Learning
                                                                      Sciences, 2016.
                                                                 [10] J. J. Williams and T. Lombrozo. The role of
                                                                      explanation in discovery and generalization: Evidence
                                                                      from category learning. Cognitive Science,
                                                                      34(5):776–806, 2010.
                                                                 [11] J. Wittwer, M. Nückles, and A. Renkl. Improving
                                                                      human tutoring by improving tutor-generated
                                                                      explanations. In Avoiding Simplicity, Confronting
                                                                      Complexity, pages 359–368. Brill Sense, 2006.
                                                                 [12] J. Wittwer and A. Renkl. Why instructional
                                                                      explanations often do not work: A framework for
                                                                      understanding the effectiveness of instructional
                                                                      explanations. Educational Psychologist, 43(1):49–64,
                                                                      2008.