Recommending Learning Materials to Students by Identifying their Knowledge Gaps Konstantin Bauman Alexander Tuzhilin Stern School of Business Stern School of Business New York University New York University kbauman@stern.nyu.edu atuzhili@stern.nyu.edu ABSTRACT Art History We propose a new content-based method of providing rec- ommendations of educational materials to the students by ... Revival and ... identifying gaps in their knowledge of the subject matter Rebirth in Europe in the courses they take. We experimentally validate our method by conducting an A/B test on the students from an online university. Renaissance in Italy The End of the Renaissance and Rococo the Reformation Keywords Flanders Florence content-based recommendations; technology-enhanced learn- High Northern Renaissance Renaissance ing; knowledge gaps 1. INTRODUCTION Figure 1: Part of Taxonomy for Art History Course Due to the recently increased interest in online educational technologies and educational delivery methods, the topic of 2. RECOMMENDATION METHOD recommendations in the educational domain has become in- Our recommendation method is based on the “gap filling” creasingly important lately. In particular, it has been stud- idea discussed in Section 1. In particular, for each course in a ied in various communities, including RecSys, UMAP, Ad- curriculum, we build taxonomy of the topics covered in that vanced Learning Technologies, and the Technology-Enhanced course. For example, Fig.1 shows a part of the Art History Learning communities, and many approaches have been pro- course taxonomy where each node represents a topic. A node posed on how to recommend learning materials to the stu- in the taxonomy has a set of obligatory reading materials dents to improve their learning performance [2]. chosen by the instructor and associated with this topic. One of such recommendation methods is based on the idea For each student and a course offering we determine how of identifying and filling the “gaps” in students’ knowledge well the student understood all the topics specified in the in the subjects that they are studying. The idea of gap course taxonomy by analyzing the student performance data identification is not new, however. For example, Ciuciu and in that course. At the end of this analysis, each student gets Demey referred to it in [1] and proposed an initial approach a certain performance score for each topic in the course tax- on how to deal with it. Unfortunately, they stopped short of onomy specifying how well the student understood a partic- describing the specific recommendation algorithm, leaving it ular topic. For example, in course Art History for topic Ro- as a topic of future research. Also, [3, 4, 5] proposed methods coco Joe got the score 0.94 while John got 0.67. This means that are somewhat related to the “gap filling” idea, but the that Joe understood Rococo well, while John did not. Al- authors mainly focused on developing their frameworks and though this score can be computed in many different ways, not on presenting specific recommendation algorithms. in our experiments described in Section 3 we have done it In this paper, we present a novel method of identifying as follows. For each test performed by the student and each gaps in students’ knowledge and propose specific algorithms question on the test, we determine the list of topics in the to fill-in these gaps by providing recommendations of reme- course taxonomy to which this test question corresponds. dial learning materials to the students. In contrast to many Then for each topic we determine the list of questions corre- prior learning recommendation methods that are predom- sponding to it and see how well the student answered these inantly rating-based [2], our method, described in Section questions. For example, if there are 10 questions in the test 2, is content-based. In addition to developing this method, corresponding to topic Rococo and Joe answered 9 of them we also performed A/B testing on the students of a leading correctly, then Joe’s score for this topic is 0.9. online university to validate our approach. We present our After we determine students’ performance scores for each experiments and the preliminary results in Sections 3 and 4. topic in the course taxonomy, we identify their knowledge gaps, i.e., identify those topics on which they performed poorly. In particular, a student has a knowledge gap for Copyright is held by the author/owner(s). RecSys 2014 Poster Proceedings, October 6-10, 2014, Foster City, Silicon a topic if either (a) the performance score of a student for Valley, USA. this topic is low (i.e., below a certain threshold level) or (b) the student has knowledge gaps for a sufficient number of We provided recommendations to the first and the second subtopics of that topic (and therefore needs remedial actions groups up to three times. The first recommendation of the for these subtopics). supplementary reading materials was provided shortly be- After we identify the knowledge gaps, we determine what fore they took graded Quiz 1. The second one was provided types of remedial materials should be recommended to the before students took graded Quiz 2, and the last one shortly students in order for them to close these gaps. We accom- before students took the final exam. plish this task as follows. First, we build a library of re- The goal of this experiment is to test two hypotheses: (1) lated reading materials for each course consisting of (but recommendations (personalized and non-personalized) lead not limited to) the most popular textbooks, online articles to better performance results, as measured by student’s total and various web pages related to the course. Each document score on the final exam; (2) personalized recommendations, in this library can have its own taxonomy that is based on as described in Section 2, lead to better performance re- the document’s table of content. For example, a textbook is sults vis-à-vis providing non-personalized recommendations divided into chapters, sections and subsections. In contrast, (as measured by the final exam score). some other documents, such as short articles, may not have In addition, we also sent a survey to those students who any taxonomy and therefore are not “divisible” into smaller have received at least one recommendation at the end of the pieces. Also, we establish the relationship between the ma- semester in order to see how well they perceived our rec- terials in this library and the course taxonomy as follows. ommendations and also to detect possible biases and prob- For each node in the course taxonomy we identify the “unit lems with the experimentation. In particular, we asked the of knowledge” in the library (e.g., book chapter) correspond- students how much they liked our recommendations, i.e., ing to it in the best way, thus establishing the link between what was their overall impression about the recommenda- the node and the reading material. In particular, we do this tions (vis-à-vis individual recommendations, as is normally identification by using the TF-IDF-based measure of corre- done in recommender systems). spondence between the book unit and the textual description of the topic. 4. RESULTS Given the structure of the course, the identified gaps in The results of the survey revealed that the vast major- student knowledge in the class, and the links between the ity of the students indeed liked our recommendations and topics in the course taxonomy and the supplemental reading found them to be very useful in their studies. However, materials from the library that we described in the previous when we measured the actual performance of the students paragraph, we next provide recommendations of these sup- on the final test (as opposed to how much they liked the plementary reading materials to the students in order to recommendations), our preliminary results showed that our close these knowledge gaps. In particular, for each knowl- recommendations were not uniformly effective to all the stu- edge gap topic node in the taxonomy, we recommend those dents across all the courses. In particular, the recommen- supplementary reading materials linked to that node. dations worked the best for the mediocre students and were less effective for the excellent and good students. Also, they 3. EXPERIMENTAL SETTINGS were most effective for the poorly performing students taking business courses where statistically significant performance To validate our approach, we tested it on students of an differences on the final exam were detected in comparison to on-line university by conducting an A/B test. In particular, the control group. Further, we have also observed real per- we worked with 527 students from all over the world taking formance differences on several other segments of students one or more courses in that university over a period of one and types of courses. However, we could not demonstrate semester that lasted 9 weeks (8 weeks of studies and one that these diferences were statistically significant because of week for the final exams). There were 25 different courses the sizes of our samples and the preliminary nature of our offered during that semester covering the areas of Com- data and results. As a part of the future work, we plan to puter Science (10 courses), Business (10 courses) and Gen- enhance our data and provide more extensive analysis on it eral Studies (5 courses). In total, we had 692 enrollments to demonstrate that personalized recommendations indeed of all these students in the courses (i.e., 692 student/course lead to better performance results. pairs) during that semester. Studies during each week are carefully structured in that university and consist of (a) a set 5. REFERENCES of obligatory reading materials,(b) various assignments,(c) [1] I. Ciuciu and Y. Demey. An evaluation methodology for questions to be discussed on the discussion forums and (d) c-foam applied to web-based learning. In AWBL. 2012. a self-testing quiz (not contributing to the overall grade for [2] N. Manouselis, H. Drachsler, V. Katrien, and D. Erik. the course). There are also two quizzes administered by the Recommender Systems for Learning. Springer, 2013. university during the semester that contribute to the final [3] A. Mavroudi and T. Hadzilacos. Broadening the use of grade for the course. There is also the final exam given at e-learning standards for adaptive learning. In Advances the end of the semester during week 9. in Web-Based Learning. Springer Berlin, 2012. In our experiments, we spilt the students into the follow- ing three groups. The first group received personalized rec- [4] S. Saman, B. Seyed, Z. Nor, and N. Shahrul. ommendations as described in Section 2. The second group Ontological approach in knowledge based recommender received the standard set of (non-personalized) recommen- system to develop the quality of e-learning system. In dations where all the students got the same set of recom- Australian J. of Basic and Applied Science. 2012. mendations as the worst students in the personalized group [5] X. Zhou, J. Chen, and Q. Jin. Discovery of action who failed all their tests (and therefore needed help for all patterns in task-oriented learning processes. In the topics in the course). The third group is the controlled Advances in Web-Based Learning. Springer, 2013. group of students who did not receive any recommendations.