Using Digital Textbooks as Knowledge Base to Detect Student Idea Development in Collaborative Discourse Jiuning Zhong1, Guangji Yuan1, Jiangwei Zhang1, and Mei-Hwa Chen1 1 University at Albany, State University of New York, Albany, 12203, NY, USA {jzhong,gyuan,jzhang1,mchen}@albany.edu Abstract. This paper presents a novel framework to monitor idea progression and novelty from learners’ discourse by using digital textbooks as the knowledge base by means of knowledge graphs. A knowledge graph depicts an idea in a tripartite consisting of two concepts and their relations extracted from students’ discourse and digital textbooks. A progressive mapping between the knowledge graphs extracted from the digital textbooks and the ones continuously collected from the students’ discourse can be used to monitor the idea progression and de- tect idea novelty, which can greatly improve the teachers’ and the students’ awareness of learning outcome. 1. Introduction Contemporary pedagogies encourage students to build deep knowledge in core cur- riculum areas through collaborative discourse and inquiry, making productive use of textbooks and other sources to scaffold, not to limit, their thinking. This study aims to design AI-empowered techniques to assess student idea development in collaborative online discourse in relation to core disciplinary concepts presented in textbooks, focus- ing on retrieving core concepts from textbooks and student discourse entries and ana- lyzing the novelty of student ideas and questions. Our approach draws upon a knowledge graph method to conduct automated textual analysis of learner discourse contributions. This paper presents an automated frame- work for extracting key concepts and ideas from textbooks and student online dis- course, tracking student idea development using knowledge graph, and gauging idea novelty. The key elements of our framework include: (i) constructing and leveraging a knowledge base in the format of knowledge graphs using the triple units extracted from digital textbooks, (ii) constructing both personal (individual-centric) and collective (group-centric) temporal knowledge graphs from learner discourse for further idea anal- ysis, and (iii) analyzing the progression of ideas and capturing novel ideas using multi- dimensional measures. 2. Knowledge Graph We define a knowledge graph as G = {S, R, T}, where S, R, and T are sets of source entities, relations, and target entities, respectively. An idea from either online discourse or digital textbooks is a proposition with a triple semantic unit of two entities and the relation between them, where a proposition is denoted as a triple (s, r, t) ∈ P, which is presented by a union of two entities (source entity and target entity) and a relation, which forms a meaningful statement. Copyright © 2020 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 Our definition of an idea focuses not only on specific concepts but also on relations, which can also be extended to idea contexts in the form of a subgraph. Because, sur- rounding entities of an entity can represent the local information. Furthermore, our ap- proach focuses on the relation between entities and also the context of ideas. 2.1 Idea Novelty In collaborative discourse, students do not merely share and rephrase what they have learned from textbooks but make connected and non-redundant contributions to ad- vance the group’s knowledge and go beyond what they already know. They continually generate deeper questions and build on one another’s ideas to develop higher levels of understandings. Thus, our analytics detect the novelty of student ideas posted in online discourse over time. We define idea novelty as the extent to which an online post presents unique and relevant information that goes beyond what has already been posted in a temporal thread of conversation. The new contribution may be in the form of a new idea (thought) or question, and uniqueness may be gauged at a personal or group level. Novel ideas hidden in the discourse text have high dimensional textual features that can be evaluated using multidimensional measures. In addition, novelty can be out of knowledge base information or novel alias for existing information in the knowledge base. During online discussions, we monitor student idea development and identifying their novel ideas based on the mapping of constructed knowledge graphs. Our novelty rubric in- cludes new concept, new question, new relation, and new context. 3. The Design of Novel Automated Framework Our novel automated framework (shown in Figure 1) consists of two systems, online system and offline system, both sharing the Natural Language Understanding Compo- nent that processes raw unstructured textual data into triple units for knowledge graph construction. The offline system focuses on knowledge acquisition and knowledge base graph construction from digital textbooks and external sources on the web, which are highlighted in dark blue. The online system (highlighted in yellow) mainly supports many learners concurrently and connects with the novelty analysis component that uses incoming information from three knowledge graphs to conduct deeper analysis and sends the idea analysis feedback back to learners. Fig. 1. A Novel Automated Framework of Idea Novelty Detection. 3 The significance of our novel automated framework is: 1) the framework connects each learner’s input into collective and personal knowledge graphs simultaneously and joins with knowledge base graph for idea analysis; 2) to meet our novelty rubric, the idea analysis component relies on analyzing multidimensional features of the learner input, which include knowledge graph context feature, entity semantic related features, entity centrality feature, and temporal feature; 3) the output of the idea analysis has two parts, personal level and collective (group) level, which enables evaluation for individ- uals as well as for the whole class. The multidimensional features are described in the following: Knowledge graph context feature: Given the constructed knowledge graph and new learner’s input, we adopt the two-stage embedding scheme [2] that takes into ac- count both contextual connectivity patterns and local connectivity patterns. Entity semantic related features: Given the content of digital textbooks and learner input, our system constructs appropriate representations for entities and relations. To get a low-dimensional, continuous, and dense semantic representation, we apply entity word embeddings [1] from textbook content and learner notes. Given an entity pair, the semantic relevance of the two entities can be represented by their cosine similarity and their Euclidean distance in the vector space. Entity Centrality feature: Centrality is shown to be intimately connected with the cohesive subgroup structure of a graph [4], which has been suggested as a good indica- tor of the importance of novelty. Temporal feature: Decay operates on the assumption that the learner note in a dis- cussion has a certain level of coherence, and therefore, show some cognitive continuity [3], and longer exposure of entities or relations would have diminishing influence on learners [5]. Each entity and relation contain the timestamp t of their creation, which would be the input of our novelty decay function. We have been testing and refining our framework based on an extensive dataset of online discussions collected from a set of Grade 5 science classrooms. Figures 2 and 3 show the mapping between the knowledge base graph and the temporal knowledge graph generated based on student online discussion. In this example the knowledge base graph (shown in Figure 2) was constructed from textual data in a digital textbook: “Pass the Energy Please”; and the corresponding online discourse graph (shown in Figure 3) was generated based on a fifth-grade classroom discussion. Fig. 2. Knowledge Base Graph. Fig. 3. Collective Temporal Graph. 4 When a triple unit is added to the graph, the framework checks if the entities and their relation exist on the knowledge base graph by applying distance-based and simi- larity matching based scoring functions to check entities. If there is a match, it selects and maps entities as highlighted in color on the two graphs, such as “Food” and “Plant” on the base graph. Otherwise, entities are not colored in the graph, like “Space” and “Global Warming” on the collective graph. Also, the new entities and relation are con- sidered new concepts and relation. Each entity and relation on the temporal graph con- tain additional temporal information such as “createdTime” that can be used by the novelty decay function to calculate an importance score and author’s name for tracking individual learner’s idea progression. Meanwhile, a novel context detection function compares the subgraph of each matched entity against the neighbors of the same entity in the base graph in the format of embedding scheme [2] that takes into account both contextual connectivity patterns and local connectivity patterns to see if a context nov- elty exists, for instance, “energy” in the dashed line rectangle on the collective graph (Figure 3) has a novel context as the same entity has a distinguishing subgraph on the base graph. On the base graph, “energy” is around “soil”, “creature”, and “plant”, which indicates a context of food chain concept. While on the collective graph, “energy” is surrounded by entities as “Marshmallow”, “Sun”, “Photosynthesis”, “Oxygen”, which indicates a new and novel context about energy generation by photosynthesis and en- ergy release. 4. Conclusion We have proposed a novel automated framework consisting of a knowledge graph construction task from learner discussions and digital textbooks, an idea assessment task including tracking idea development and identifying novel ideas based on the map- ping of knowledge graphs. Each important idea from digital textbooks or learner dis- course is extracted as a semantic triple unit of a proposition with two entities and a relation between them and further used to construct the corresponding knowledge graphs. The idea reasoning task analyzes the progression of ideas and captures idea novelty from multidimensional measures. During the construction of knowledge graphs, we faced many challenges, especially in subtasks like entity recognition and alignment, relation extraction due to colloquial language in learner discourse. We will apply additional training on the data and more advanced machine learning techniques to improve the precision and recall of the out- come. Upon all the work, we are creating visual abstractions of knowledge and ideas as knowledge graphs and multidimensional analytics of knowledge-building discourse that include idea progression, novelty, and digital textbook relevancy, which provide very valuable feedback to students and teachers. 5 References 1. Mikolov, T., Chen, K., Corrado, G., & Dean, J. Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013). 2. Luo, Y., Wang, Q., Wang, B., & Guo, L. Context-dependent knowledge graph embedding. in EMNLP, pp. 1656–1661 (2015). 3. Liu, H., Lieberman, H., & Selker, T. A model of textual affect sensing using real-world knowledge. In Proceedings of the Seventh International Conference on Intelligent User In- terfaces, pages 125–132 (2003). 4. Borgatti, S.P., Everett, M.G. A graph-theoretic perspective on centrality. Social Networks 28 (4), 466–484 (2006). 5. Feng, S., Chen, X., Cong, G., Zeng, Y., Chee, Y. M., & Xiang, Y. Influence maximization with novelty decay in social networks. In AAAI, pages 37–43 (2014).