PASTEL: Evidence-based learning engineering method to create intelligent online textbook at scale Noboru Matsuda and Machi Shimmei Center for Educational Informatics Department of Computer Science North Carolina State University Noboru.Matsuda@ncsu.edu Abstract. An extension of online courseware with the macro-adaptive scaffold- ing, called CyberBook, is proposed. The macro-adaptive scaffolding includes (1) a dynamic control for the amount of formative assessments, (2) a just-in-time navigation to a direct instruction for formative assessment items on which a stu- dent failed to answer correctly, and (3) embedded cognitive tutors to provide in- dividual practice on solving problems. The paper also proposes two learning-en- gineering methods to effectively create a CyberBook: a web-browser based au- thoring tool for cognitive tutors and a text-mining application for automated skill- model discovery and annotation. A classroom evaluation study to measure the effectiveness of the CyberBook was conducted for two subjects: middle school science (Newton’s Law) and high school math (Coordinate Geometry). The re- sults show that students who used the fully functional Science CyberBook out- performed those who used a version of CyberBook where the macro-adaptive scaffolding was turned off. However, the same effect was not observed for Math Cyberbook. On both subjects, students on the CyberBook with the macro-adap- tive scaffolding answered on fewer number of formative assessments due to the dynamic control. Further data analysis revealed that those who asked for more hints on the formative assessments achieved higher scores on the post-test than students who asked for fewer hints. The effect of hint usage was more prominent for students with the low-prior competency. Keywords: Online Courseware, Macro-adaptive Scaffolding, Learning Engi- neering. 1 Introduction One of the challenges in current online textbooks is a lack of individual support that apparently hinders students’ learning. For example, competency-based scaffolding is desired for students who need tailored scaffolding based on their competency. The lack of an embedded student model results in the excessive training—i.e., all students being exposed to a fixed amount of assessments regardless of their competencies, which severly impacts students’ learning [1, 2]. Studies say that the excessive training de- creases students’ motivation and causes an early course termination [3-6]. 2 A technology innovation to drive individualized scaffolding on a large-scale online textbook that can be plugged-in to existing online-course platforms is therefore criti- cally needed. Without such technology, the online textbook will not fully show its po- tential to impact a large body of students’ learning. We hypothesize that the lessons learned from long-standing research on intelligent tutoring systems, in particular the skill-model based pedagogy, will apply to the large- scale online textbook. The skill-model based pedagogy requires a skill-model (aka a knowledge-component model) that consists of skills each representing a piece of knowledge that students ought to learn. Given that each individual instructional content will be tagged with a skill, the system will compute a proficiency of each skill for each individual student to decide an appropriate pedagogical action. Cognitive tutors, for example, deploy the model-tracing and knowledge-tacing techniques upon a given skill model to drive micro (flagged feedback and just-in-time hint) and macro (problem se- lection) adaptive instructions [7, 8]. A major technical challenge in this line of research concerns the scalability of exist- ing techniques for creating a skill model. A transformative technique to fully automat- ically create a skill model and annotate instructional materials on actual online courseware is desired. A primary goal of the current paper is to introduce a platform-agnostic suite of learn- king-engineering methods (called PASTEL; Pragmatic methods to develop Adaptive and Scalable Technologies for next generation E-Learning) that allows courseware de- velopers to efficiently create a particular type of online courseware called CyberBook. The CyberBook is an intelligent textbook that provides students with macro-adaptive pedagogy driven by an embedded skill model and cognitive tutors. As a proof of con- cept, we have made two instances of CyberBook, one for middle school science and another for high school math, and tested their effectiveness with actual students. 2 Solutions: CyberBook and PASTEL 2.1 CyberBook CyberBook is a structured sequence of instructional activities organized in multiple chapters, sections and units. CyberBook contains three types of instructional elements: (1) direct instructions that convey subject matters (e.g., skills and concepts) usually with written paragraphs and videos, (2) formative assessments typically in the form of multiple-choice or fill-in-the-blank questions, and (3) cognitive tutors that provides mastery practice on solving a particular type of problem. These cognitive tutors are equipped with hint messages that are also considered as instructional elements. On CyberBook, the three types of learning activities may be placed on multiple pages that compose a “unit.” Multiple units become a “section,” and a collection of sections becomes a “chapter.” Each page has a navigation for page forward/backward, but students may freely visit any page in any order through a table of contents that is also available from any page. A current version of CyberBook provides students with two types of the macro- 3 adaptive scaffolding: (a) a dynamic control for the amount of formative assessments and cognitive tutors, and (b) a just-in-time navigation to a direct instruction for a form- ative assessment that a student failed to answer correctly. The first type of adaptive scaffolding, the dynamic control for the amount of assess- ments and practice, is to determine which formative assessment items and cognitive tutors should be given to individual students based on their competency. This dynamic control may reduce the number of unnecessary assessments, i.e., the excessive training. To judge if an assessment item (either a cognitive tutor or a formative assessment) is beneficial to a particular student, the system applies the knowledge-tracing technique [9] to compute the probability of an individual students answering the next formative assessment item correctly. Based on the student model, those assessment items and cognitive tutors that the student would highly likely (> 0.85) answer/perform correctly are automatically hidden from the students’ view. The second type of adaptive scaffolding, the just-in-time navigation, is to provide students with a link (called a dynamic link) to a corresponding direct instruction for (and only for) formative assessments and cognitive tutors that they failed to answer correctly. To provide this macro-adaptive scaffolding, all the written instructional elements, i.e., text paragraphs, assessment items, and cognitive tutors are tagged with skills. This skill tagging is automatically done by the SMART method introduced in the next section. The concept of CyberBook is platform independent hence it can be implemented on any online learning platform. Currently, as a proof of concept, we have prototyped CyberBook on Open edX (edx.org) and Open Learning Initiative (Carnegie Mellon University). 2.2 PASTEL PASTEL is a collection of learning-engineering methods to efficiently build online courseware with embedded the skill model and cognitive tutors. In this paper, we de- scribe two PASTEL methods that are used for the current study: a text-mining applica- tion for an automated skill-model discovery (SMART) and a web-browser based cogni- tive tutor authoring tool (WATSON). SMART: Skill Model mining with Automated detection of Resemblance among Texts SMART is a method for automatic discovery of a skill model from a given set of instruction texts. The unit of analysis is a “text,” which is either a written paragraph, question sentences for a single assessment item, or hint messages for a single cognitive tutor. SMART first applies the k-means text clustering technique [10] to divide assessment items into clusters with similar semantic meanings. Prior to clustering and keyword extraction, all “texts” are distilled by removing the punctuations and stopwords, which are a set of words that have little grammatical values (e.g., articles, conjunctions, and prepositions, etc.). Distilled “texts” are split into words (aka tokens). Each tokenized “text” is converted into a Term Frequency (TF) vector showing weighted frequency of the total tokens in all given texts (called the token space). For example, the i-th element 4 of a TF for a “text” corresponding to a written paragraph, shows the frequency of the i- th token in the token space appearing in the “text” (or zero if the “text” does not contain that token). Our naïve assumption is that if a set of “texts” are all about a same latent skill (e.g., paragraphs explaining a concept X and assessment items asking about the concept X), the latent skill can be identified from the set of “texts.” We then hypothesize that as- sessment items in a particular cluster are assumed to assess the same particular skill. Each cluster of assessments is therefore given a label, that becomes a skill name, by applying keyword extraction technique, TextRank [11]. Finally, each written paragraph and a hint message (of a cognitive tutor) is paired with the closest cluster, i.e., a skill, using the cosine similarity [12]. As a result, the instructional elements on CyberBook are fully automatically tagged with skills, and a three-way skill mapping among written paragraph, assessment items, and cognitive tutors (through their hint messages) is formed. WATSON: Web-based Authoring Technique for adaptive tutoring System on Online courseware WATSON is a web-browser based authoring tool to create cognitive tutors by demon- stration. Fig. 1 shows an example screenshot of WATSON. Cognitive tutors allow stu- dents to practice solving problems while providing the double-looped micro-adaptive scaffolding—scaffolding between problems (aka, the outer-loop) and within a problem (aka, the inner-loop) [7]. The tutor continuously provides students with problems, while the students solve a given problem step by step, until they show mastery in solving the problems. The outer-loop scaffolding uses domain pedagogy to pose a problem to be Fig. 1: An example screenshot of WATSON. 5 solved next that maximizes students’ likelihood of achieving the mastery. The inner- loop scaffolding uses domain knowledge that consists of the immediate flagged feed- back on the correctness of steps performed, and the just-in-time, on-demand hint on how to perform a next step. The outer-loop is driven by the knowledge-tracing technique [9] whereas the inner- loop is driven by the model-tracing technique [13]. Both techniques are task independ- ent and rely only on a given domain expert model that is written as a set of production rules each of which represents a piece of skill that students ought to learn. This implies that creating a cognitive tutor is reduced to creating a domain expert model, a set of problems used for tutoring, and a tutoring interface. WATSON is built on a third party Cognitive Tutor Authoring Tool (CTAT) [14]. To build a cognitive tutor using WATSON, an author first uses CTAT to create a tutoring interface diretly on a web-browser. CTAT outputs an HTML5 code for the tutoring interface. WATSON renders the HTML5-based tutoring interface on a web-browser with additional graphical user interface for the author to interactively create a domain expert model (Fig. 1-a). To create the domain expert model, the author interactively tutors a machine learning agent, called SimStudent [15], through the tutoring interface. The au- thor poses a problem for SimStudent and asks SimStudent to solve the problem (Fig. 1- d). SimStudent may attempt to solve the problem by performing one step at a time (which in this example, corresponds to entering a value in a text box shown on the tutoring interface). When SimStudent performs a step, it asks the author to provide feedback on the correctness (Fig. 1-e). The author responds by clicking the [yes/no] button. When SimStudent gets stuck on performing a step, it asks the author to demon- strate the next step. The author then performs that step on the tutoring interface. Through the interctive tutoring, SimStudent produces a set of production rules each of which corresponds to a single step reified on the tutoring interface. Each production rule therefore represents a particular skill that is sufficient to perform a particular step. The author provides a name of each production rule while tutoring. Those production rules will be used as a domain expert model for the cognitive tutor with the exact HTML5 tutoring interface used to tutor SimStudent. While authoring, a list of skills and problems tutored are displayed (Fig. 1-b and c respectively). When the author clicks on a skill name on the graph (Fig. 1-b), a current production rule written in the Jess language [16] is shown on a separate browser tab. When the author mouse-over a name listed in the problem bank (Fig. 1-c), names of the skills used to solve the corresponding problem are shown in a pop-up dialogue. 3 Evaluation Study To measure the effectiveness of CyberBook, we conducted an evaluation study at our partner schools using the instances of CyberBook for middle school science and high school math. The Science CyberBook had 11 sections (40 units) with 17 videos and 83 formative assessments. There were no cognitive tutors embedded for the Science CyberBook due to a constraint on the timing of the study and the development cycle. All of the adaptive scaffolding functionalities mentioned earlier were available. The 6 Math CyberBook, on the other hand, had 23 sections (26 units) with 179 formative assessments and 14 cognitive tutors. No video was used for the Math CyberBook, partly because the in-service math teacher who led the curriculum design believed that videos would not be necessariy if the curriculum has robast and rich instructions and graphics. 3.1 Method The school study was a stratified randomized controlled trial with two treatment con- ditions—fully functional CyberBook (the Adaptive condition, hereafter) vs. a version of CyberBook without the macro-adaptive scaffolding (the Non-Adaptive condition). For Science, two public middle schools in Texas, USA participated with 131 and 34 students in 6 and 2 science classes respectively. For Math, 143 and 25 students in 5 and 2 math classes from two public high schools in Texas, USA were participated. The study was conducted in their usual science and math class periods as a part of their business-as-usual classroom activities. The school study sessions involved 5 days, one classroom period per day. On Day 1, all students took a pre-test. For Science, the test lasted for 20 minutes with 18 multiple- choice questions. For Math, the test lasted for 30 minutes with 21 multiple-choice ques- tions. For both subjects, the test was printed on paper, but students were asked to enter their answers through an online form. After taking the pre-test, students were randomly assigned to one of two conditions using the stratified randomization based on the pre-test score—i.e., the difference in the mean pre-test scores between two conditions was aimed to be the minimum; for Sci- ence, MAdaptive = 0.68±0.23 vs. MNon-Adaptive = 0.70±0.25, t(140) = 0.76, p = 0.45; for Math MAdaptive = 0.36±0.19 vs. MNon-Adaptive = 0.36±0.19, t(132) = -0.47, p = 0.64. On Day 2 through Day 4, students used their assigned version of CyberBook. During this phase, students worked on CyberBook at their own pace while they were encour- aged to ask questions to teachers, if necessary. Equally, teachers were encouraged to interact with their students in the same way as they usually do in their classrooms. On Day 5, students took the post-test that was isomorphic to the pre-test, i.e., the same number and types of problems that can be solved by applying the same knowledge. For post-test items, the difference is in their cover stories and quantities used. Two researchers attended each of the classroom sessions to take field observation notes and help students overcome any technical issues. Those researchers did not pro- vide students with any instructional scaffolding (but only encouraged students to ask their teachers for an assistance when needed). 3.2 Results There were no particular exclusion criteria for participants during the study—all stu- dents were welcomed to participate in any part of the study. In the following analysis, however, we include only those students who took both pre- and post-tests, and at- tended all three days of intervention. Table 1 shows the number of participants who 7 Table 1: The count of students who participated the study and who meet the inclusion criteria (take both pre- and post-tests, and attend all three days of intervention). Class Pre-test Post-test Sufficient Students Enrollment Intervention Included Science 165 155 154 157 144 (77/78) (79/75) (80/77) (73/71) Math 168 148 153 159 134 (76/72) (76/77) (78/81) (67/67) Total 333 303 307 316 278 (153/150) (155/152) (158/158) (140/138) Note. Parentheses show a breakdown into conditions (Adaptive/Non-Adaptive) Table 2: Mean test scores. Science Math Pre-test Post-test Pre-test Post-test Adaptive 0.68(0.22) 0.77(0.16) 0.35(0.18) 0.45(0.19) Non-Adaptive 0.71(0.25) 0.74(0.19) 0.37(0.19) 0.46(0.23) took pre and post-tests respectively, and those who attended all three days of interven- tion (i.e., Day 2 through Day 4). The table also show the number of participants who meet the inclusion criteria. Test Scores: Table 2 shows mean test scores comparing two conditions both for Sci- ence and Math. To see if there was an effect of the macro-adaptive scaffolding on stu- dents’ learning, a repeated-measures ANOVA was conducted for each subject inde- pendently, with post-test score as the dependent variable and test-time (pre vs. post) and condition (Adaptive vs. Non-Adaptive) as fixed factors. For Science, there was an interaction between condition and test-time; F(1,142) = 5.61, p < 0.05. A post-hock analysis revealed that only the adaptive condition shows an increase in the test score from pre- to post-tests; for Adaptive condition: paired- t(72) = -5.18, p < 0.001, d = 0.46; for Non-Adaptive condition: paired-t(70) = -1.52, p = 0.13, d = 0.13. In the science classes, students who used a version of CyberBook with the macro-adaptive scaffolding outperformed students without adaptive scaffolding on the post-test. For Math, there was a main effect of test-time (F(1,132) = 61.57, p < 0.001), but condition was not a main effect (F(1,132) = 0.23, p = 0.63). In the math classes, stu- dents’ scores on the test increased from pre- to post-tests equally regardless of whether the macro-adaptive scaffolding was available. Behavior Analysis: To understand why only the Science CyberBook showed the effect of the macro-adaptive scaffolding, we analyzed the process data showing detailed in- teractions between students and the system while they were working on the CyberBook. 8 The process data contain the clickstream data (including the information about the as- sessment items such as problem ID and the skills associated with each problem) and the correctness of the students’ answers. We first hypothesized that there was a condition difference in the way students watched the video vs. answering formative assessments on the Science CyberBook (there was no video on Math CyberBook). Not surprisingly, there was a notable condi- tion difference in the number of formative assessments students answered in Science CyberBook; MAdaptive = 62.2±16.93 vs. MNon-Adaptive = 74.7±12.69, t(133) = -5.04, p < 0.001, d = 0.84. The dynamic control for the amount of problems effectively reduced the number of formative assessments for the Adaptive students. There was, however, no statistically reliable relationship between the number of formative assessments an- swered and the post-test score when the pre-test was entered as the primary factor to a regression model; F(1,141) = 3.08, p = 0.08. There was no condition difference in the number of videos watched either; MAdaptive = 25.6±26.23 vs. MNon-Adaptive = 25.0±28.92, t(128) = 0.145, p = 0.89. The doer/non-doer effect that predicts that learning by doing (i.e., working on formative assessments) better facilitates learning than by watching videos [17] did not present in the current study on the Science CyberBook. As a side note, for Math, there was no condition difference in the number of formative assess- ments students answered; MAdaptive = 55.5±12.79, MNon-Adaptive = 51.0±15.65, p = 0.07, d = 0.32. Second, we hypothesized the dynamic link, which was only available for Adaptive students, effectively facilitated learning on Science CyberBook. This hypothesis was not supported. To our surprise, the average number of dynamic-link clicked was quite low; M = 0.4±1.8. It turned out that, in the instances of CyberBook used in the current study, most of the linked contents are placed on the same page as the assessment item, at relatively in a close distance. The field observation notes collected during classroom sessions mentioned that students noticed that they were simply able to scroll up the page to review a related content instead of clicking on the dynamic link. Third, we explored students’ engagement in learning by doing—i.e., how seriously students worked on formative assessments. In particular, we investigated whether Adaptive students worked on multiple choice questions more seriously than Non-Adap- tive students. On Science CyberBook, about 2/3 of the formative assessments are mul- tiple-choice questions. The degree of engagement on the multiple-choice question might have had a significant impact on students’ learning [18]. We operationalized the “seriousness” as the number of choice items submitted before making a correct answer. Since CyberBook provides the immediate feedback on an answer submission, when students were not engaged in learning, they might merely try choice items one-by-one until they see affirmative feedback. We hypothesized that the ratio of choice items (RCI) submitted before submitting a correct answer on multiple-choice questions is lower among Adaptive than Non-Adaptive students. This hypothesis was not sup- ported. There was no condition difference in the average RCI per student; for Science, MAdaptive = 0.48±0.06 vs. MNon-Adaptive = 0.47±0.07; t(140) = 0.78, p = 0.44. The same trend was observed for Math; MAdaptive = 0.52±0.08 vs. MNon-adaptive = 0.50±0.07; t(129) = 0.78, p = 0.20. Fourth, we investigated the difference in the hint usage between Adaptive and Non- 9 Adaptive students. In particular, we hypothesized that Adaptive students used hints more frequently when they failed to answer a formative assessment item correctly. We operationalize the hint usage on failed assessment items (per student) as the ratio of assessment items on which a student failed to answer correctly and asked for a hint to the total number of assessment items that the student failed to answer correctly—de- noted as Hint on Failure Ratio (HFR). This hypothesis was supported only for Science. When aggregated across all students within each condition, there was a condition dif- ference on HFR for Science; MAdaptive = 0.32±0.23 vs. MNon-Adaptive = 0.23±0.19; t(138) = 2.40, p < 0.05, d = 0.40. For Math, the condition difference was weaker; MAdaptive = 0.31±0.26; MNon-Adaptive = 0.24±0.24; t(130) = 1.71; p = 0.09; d = 0.30. However, a regression analysis did not confirm a correlation between HFR and post-test score when pre-test was entered to the model as the primary factor; for Science, pre-test was a sig- nificant predictor, F(1,141) = 177.9, p < 0.0001; HFR was not, F(1,141) = 1.28, p = 0.26. The same trend was observed for Math; pre-test F(1,130) = 167.4, p < 0.0001; HFR F(1,130) = 3.17, p = 0.08. A further analysis revealed that the correlation between HFR and post-test score was negative; for Science r(142) = -0.34, p < 0.001; for Math r(131) = -3.99, p < 0.001. Though this finding was controversial at the beginning, we hypothesized that (1) stu- dents with a low prior competency (measured as pre-test score) needed more hints on failed assessments—i.e., HFR and pre-test score were negatively correlated, and (2) pre-test and post-test scores were highly positively correlated as is almost always the case in school evaluation studies. If this hypothesis is true, then we should see more evident condition difference of HFR among low prior students than high prior students. This hypothesis was supported as shown in Table 3. It was only for Science that condition (Adaptive vs. Non-Adaptive) was the main effect for HFR for low-prior students. Although, understanding the exact reason why Adaptive Low-Prior students used more hint than Adaptive Non-Adaptive students requires further investigation. We suspect the fact that the dynamic link (avail- able only for Adaptive students) was located physically close to the hint button might have had a positive influence on students’ hint usage. Interestingly, Table 3 shows the similar HFR values for low prior students among Science and Math students. Yet, the lack of statistical significance for Math is arguably due to a larger variance. Table 3: The average ratio of asking hint on formative assessment items on which students failed to answer correctly (HFR). Numbers in parentheses show standard deviations. The condi- tion difference is only statistically significant for low prior students on the Science CyberBook. Science Math Adaptive Non-Adaptive Adaptive Non-Adaptive Low Prior 0.41(0.24)a 0.27(0.20)a 0.39(0.28)b 0.27(0.28)b High Prior 0.22(0.19) 0.19(0.18) 0.23(0.21) 0.20(0.17) a: t(69) = 2.57, p < 0.05; b: t(65) = 1.69, p = 0.10. 10 4 Discussions It is not entirely clear why the doer/non-doer effect was not confirmed in the current study hence needs more investigation. A previous study [17] reported that learning by doing (i.e., answering formative assessments and receive feedback/hints) is six times more effective than watching videos and reading texts. One potential hypothesis for the doer effect not shown in the current study is that almost all students in our study might have worked on the sufficient number of formative assessments hence they were all rather equally doers (hence no correlation between the number of assessments and test score observed). The students’ competency might be another factor. In the current study, the number of assessments is determined based on the student’s competency— those who have lower competency received more assessments. Therefore, it might not be surprising to see a negative correlation between the number of formative assessments answered and learning outcome (pre-test is strongly correlated with post-test after all). The dynamic link was not apparently functioning as expected in the current study, anecdotally because students noticed that related contents were just a scroll away. Un- fortunately, there was no logging made for this type of behavior (i.e., scrolling through pages and reviewing related contents). Therefore, it is technically challenging to com- prehensively evaluate the effect of the dynamic link in the current data. The improve- ment of the logging function to track the precise usage of dynamic link is one of the subjects for future system improvement. The effect of the proposed macro-adaptive scaffolding was not replicated between the Science and Math courseware. The current study is somewhat confounded due to the difference in the availability of videos (available only science) and cognitive tutors (only math). Further investigation is needed to conduct a thorough study on when and how the macro-adaptive scaffolding facilitates students’ learning. 5 Conclusion We found that the online courseware with the macro-adaptive scaffolding including a dynamic controlling of the mount of formative assessments and cognitive tutors ampli- fied students learning on a middle school science course, but the effect was not repli- cated on a high school geometry course. The major differences between these two in- stances of courseware include that: (1) only the science courseware contained 17 vid- eos, and (2) only the math courseware contained 14 cognitive tutors. The current data suggest that it was the use of hint for formative assessment items on which students failed to answer correctly that correlated with learning outcome, and this effect was present only among those students who had a low prior competency, measured as the pre-test score. This effect was only observed for the science course. Understanding why the same was not the case for the math course requires a further analysis. Creating effective online courseware at scale is one of the most demanded challenges in the current cyberlearning era. The current paper demonstrated the fidelity of imple- mentation for two learning-engineering methods to build practical online courseware with the macro-adaptive scaffolding. More studies are needed to understand what ex- actly is needed to develop practical learning-engineering methods with a firm impact on students’ learning with a diverse population. 11 References 1. Koedinger, K.R., E.A. McLaughlin, and J.C. Stamper, Automated student model improvement, in Proceedings of the 5th International Conference on Educational Data Mining, K. Yacef, et al., Editors. 2012. p. 17-24. 2. Martin, B., et al., Evaluating and improving adaptive educational systems with learning curves. User Modeling and User-Adapted Interaction, 2011. 21(3): p. 249-283. 3. Gates, S.J., et al., eds. Engage to Excel: Producing One Million Additional College Graduates with Degrees in Science, Technology, Engi- neering, and Mathematics. 2012, PCAST STEM Undergraduate Working Group: Office of the President, DC. 4. Goodman, I.F., Final Report of the Women’s Experiences in College Engineering (WECE) Project. 2002, Cambridge, MA: Goodman Research Group. 5. Seymour, E. and N.M. Hewitt, Talking About Leaving: Why Undergraduates Leave the Sciences. 1997, Boulder, CO: Westview Press. 6. Watkins, J. and E. Mazur, Retaining students in science, technology, engineering, and mathematics (STEM) majors. J Coll Sci Teach, 2013. 42(5): p. 36-41. 7. VanLehn, K., The Behavior of Tutoring Systems. International Journal of Artificial Intelligence in Education, 2006. 16. 8. Ritter, S., et al., Cognitive tutor: Applied research in mathematics education. Psychonomic Bulletin & Review, 2007. 14(2): p. 249-255. 9. Corbett, A.T. and J.R. Anderson, Knowledge tracing: Modeling the acquisition of procedural knowledge. User Modeling and User Adapted Interaction, 1995. 4(4): p. 253-278. 10. Hartigan, J.A. and M.A. Wong, Algorithm AS 136: A K-Means Clustering Algorithm. Journal of the Royal Statistical Society, 1979. C-28(1): p. 100–108. 11. Mihalcea, R. and P. Tarau, Textrank: Bringing order into texts, in Proceedings of EMNLP, D. Lin and D. Wu, Editors. 2004: Barcelona, Spain. p. 404–411. 12. Salton, G. and M.J. McGill, Introduction to modern information retrieval. 1983, Auckland: McGraw-Hill. 13. Anderson, J.R. and R. Pelletier, A development system for model-tracing tutors. Proc. of the International Conference on the Learning Sciences, 1991: p. 1-8. 14. Aleven, V., et al., The Cognitive Tutor Authoring Tools (CTAT): Preliminary evaluation of efficiency gains, in Proceedings of the 8th International Conference on Intelligent Tutoring Systems, M. Ikeda, K.D. Ashley, and T.W. Chan, Editors. 2006, Springer Verlag: Berlin. p. 61-70. 15. Matsuda, N., W.W. Cohen, and K.R. Koedinger, Teaching the Teacher: Tutoring SimStudent leads to more Effective Cognitive Tutor Authoring. International Journal of Artificial Intelligence in Education, 2015. 25: p. 1-34. 16. Friedman-Hill, E., Jess in Action: Java Rule-based Systems. 2003, Greenwich, CT: Manning. 17. Koedinger, K.R., et al., Learning is Not a Spectator Sport: Doing is Better than Watching for Learning from a MOOC, in Proceedings of the Second ACM Conference on Learning@Scale. 2015, ACM. p. 111-120. 18. Marsh, E.J., et al., The memorial consequences of multiple-choice testing. Psychonomic bulletin & review, 2007. 14(2): p. 194-199.