=Paper=
{{Paper
|id=Vol-3605/8
|storemode=property
|title=EasyDKT: an Easy-to-use Framework for Deep Knowledge Tracing
|pdfUrl=https://ceur-ws.org/Vol-3605/8.pdf
|volume=Vol-3605
|authors=Gabriella Casalino,Mattia di Gangi,Francesco Ranieri,Daniele Schicchi,Davide Taibi
|dblpUrl=https://dblp.org/rec/conf/aixedu/CasalinoGRS023
}}
==EasyDKT: an Easy-to-use Framework for Deep Knowledge Tracing==
EasyDKT: an easy-to-use framework for Deep Knowledge Tracing Gabriella Casalino1,∗ , Mattia Di Gangi2,∗ , Francesco Ranieri2 , Daniele Schicchi3,∗ and Davide Taibi3,∗ 1 Computer Science Department, University of Bari, Bari, Italy 2 AppTek GmbH, Aachen, Germany 3 Institute for Education Technology, National Research Council of Italy, Palermo, Italy Abstract The goal of knowledge tracing (KT) is to track a students’ progress over time by analyzing their historical data, so as to predict their future performance on tests related to the topics they have covered. The rise of online platforms for education, where the learning process is embedded, unlocked the potential of customized teaching such as in intelligent tutoring systems. Thanks to ongoing advancements in KT algorithms, teachers can now be aware of students’ needs and recommend appropriate learning resources. They can also rank learning content, skipping or delaying content based on difficulty. In recent years, Deep Knowledge Tracing (DKT) has proven highly effective in solving KT tasks due to its ability to model complex long-range dependencies in test sequences, resulting in better prediction quality. The field of DKT is expanding, with numerous algorithms being proposed and implemented using various technologies. This paper introduces a new framework called EasyDKT, which simplifies the development and evaluation process for DKT algorithms. The framework aims at offering users a high level of technological abstraction, with a modular structure that considers data processing, evaluation metrics, and neural network models to be trained on custom datasets. Currently, EasyDKT supports PyTorch and TensorFlow, with plans to incorporate additional technologies in the future. Experiments on the ASSISTments skill-builder dataset 2009-2010 show a case study of students’ data analysis through EasyDKT. Keywords Deep Knowledge Tracing, Education, Deep Learning, Artificial Intelligence 1. Introduction Learning is the process of obtaining fresh knowledge, adopting new behaviors, acquiring new skills, and developing new values, attitudes, and preferences. A meaningful learning experience concerns integrating recently acquired information with existing knowledge to exploit the acquired knowledge in several situations and contexts. The learning process is a highly personalized experience encompassing many activities, such as AIxEDU: 1st International Workshop on High-performance Artificial Intelligence Systems in Education, November 06–09, 2023, Rome, Italy ∗ Corresponding author. Envelope-Open gabriella.casalino@uniba.it (G. Casalino); daniele.schicchi@itd.cnr.it (D. Schicchi); davide.taibi@itd.cnr.it (D. Taibi) Orcid https://orcid.org/0000-0003-0713-2260 (G. Casalino); https://orcid.org/0000-0003-4491-0025 (M. Di Gangi); https://orcid.org/0000-0003-0154-2736 (D. Schicchi); https://orcid.org/0000-0002-0785-6771 (D. Taibi) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings reading, writing, listening, observing, thinking, and testing. Recognizing that every student has a unique learning pace and requirements is crucial, so personalized learning is essential to optimize the learning experience and ensure maximum benefit. Moreover, each student has followed a personal learning path. In this sense, Knowledge Tracing (KT) supports personalized learning by analyzing students’ previous interactions with specific topics to predict their performance on future tests. Teachers can use KT algorithms to pinpoint their students’ learning needs and suggest relevant materials accordingly. This also allows them to prioritize learning content by postponing or skipping material that may be particularly challenging. These advancements have significantly improved the educational experience for students and have enabled teachers to provide more individualized and effective instruction [1]. According to the Beijing consensus [2], Artificial Intelligence (AI) can support personalized learning, offering systems capable of recognizing the students’ needs and offering them valid support [3, 4, 5, 6, 7]. New studies on KT have leveraged AI to develop self-governing systems to monitor student competencies. Deep Knowledge Tracing (DKT) utilizes deep learning to enhance the analysis of intricate, far-reaching connections in assessment sequences that depict a student’s abilities [8, 9]. DKT is an expanding area investigating many algorithms developed through various technologies. To make the usage of DKT easier, this paper proposes EasyDKT, an innovative framework that offers users a high level of technological abstraction in implementing DKT. We leveraged a modular design that makes the framework easy to extend with other DKT models, integrating and combining custom datasets. In addition, EasyDKT abstracts the data processing and the evaluation stage, facilitating tasks such as comparing several DKT models. It has been implemented in Python and supports PyTorch [10], and TensorFlow [11]. Currently, EasyDKT implements the original DKT model proposed by Piech et al. [12]. Such a choice is due to the model’s importance and the difficulties for the scientific community to implement it with modern frameworks since the original software libraries no longer work. In this way, we contribute to the research field by offering a modern version of the original DKT model that achieves the same performance and that is easily accessible. To validate the implementation, we presented a case study using EasyDKT to analyze the well-known ASSISTments skill-builder dataset 2009-2010. Experiments have been conducted varying the model’s hyperparameters to validate the effectiveness of the proposed tool. We achieved a maximum value of AUC of 0.84, very close to 0.86 reported in [12]. The paper is organized as follows: Section 2 briefly introduces the main ideas behind Knowl- edge Tracing. Section 3 reviews the literature of Deep Knowledge Tracing, from the first model to more recent advancement. The EasyDKT framework is then presented in Section 4, together with details of the modules it is composed by. The experimental design and the evaluation results are presented in Section 5. Finally, section 6 concludes the paper and depicts future direction for this research. 2. Knowledge Tracing Corbett and Anderson proposed the former model of Knowledge Tracing (KT) in their ACT Programming Tutor (APT), and it was intended to guide students in Lisp programming activities [13]. As illustrated in figure 1, the idea behind the KT theory is that the student’s knowledge Figure 1: Base schema of Knowledge Tracing. can be modeled if the domain knowledge is organized into a hierarchical structure of skills that is proposed during the learning experience, so as students can master low-level skills before ap- proaching to the highest level skills. It is assumed that students first acquire knowledge through declarative form, followed by the acquisition of domain-specific procedural knowledge through practical tasks. In particular, a set of the rules (skills and sub-skills) that the student should have known is defined, and for each exercise, the probability that the student has learned each rule is evaluated. Analyzing the student’s interactions the system can recommend activities that improve the student’s competencies, such as analyzing and studying simpler topics preparatory to the main topic [13]. Autonomous Knowledge Tracing belongs to the Intelligent Tutoring System (ITS) field, which utilizes cognitive models to evaluate students’ understanding. The system adapts its feedback and guidance based on the student’s knowledge, thereby enhancing the quality and speed of learning. A model is developed for the student by analyzing their progress in a sequence of tasks. The algorithm closely monitors each exercise’s outcomes, noting any successful or unsuccessful attempts. This data is then used to predict how well the student will perform in the subsequent exercises. Probabilistic models based on Bayes theory were mostly used to to estimate learners’ knowl- edge states over time [14]. However, recent advancements in Deep Learning have shown its effectiveness in tackling KT. In Deep Knowledge Tracing students are modeled individually. Each student’s abilities are represented by predicted probabilities of using specific skills to solve exercises. The model automatically extracts hidden skills from the student’s past interactions and uses historical student data to predict their likelihood of mastering the next item and the skills involved, along with their probability. This enables identifying students who require extra assistance or recommending learning resources based on the acquired skills. For further information on this topic, please refer to [15]. 3. Literature Review Recurrent Neural Networks have been commonly used to address the KT problem, demon- strating impressive results in forecasting a student’s performance by reviewing their previous interactions. These models examine the sequence of question-answer pairs {𝑞𝑡 , 𝑎𝑡 } over a period to forecast the student’s response at time 𝑡 + 1. Piech et al. [12] were the first to introduce the concept of ”Deep Knowledge Tracing”. They suggested that the task should be formulated as a temporal application because the student’s knowledge increases over time. The authors experimented with both classical and LSTM RNNs to analyze the data representing the student’s history. Their goal was to trace the acquired knowledge and predict future performances without hard-coding the student competencies, which is a demanding task that requires expert annotators. This work leverages three different sets of data: Simulated-5, Khan Math, and ASSISTments “skill builder”. Simulated-5 involves 4,000 virtual students answering 50 exercises based on 5 concepts. The students’ knowledge is modeled by the Item Response Theory, and the skills improve gradually. Each exercise covers a specific concept and is labeled with a level of difficulty. The Khan Math is a collection of data from Khan Academy, consisting of 1.4 million exercises completed by 47,495 students across 69 categories. ASSISTments is a dataset used for building an Intelligent Tutor System that helps students with math problems. The tutor outputs a log of actions performed by the student every time they correctly complete an exercise. This publicly available data covers the time period of 2009-2010 and is a significant resource for addressing the KT problem using ML. Subsequently, Xiong et al. [16] have revised the preliminary score presented by Piech et al. [12], uncovering issues that were not considered. The authors have considered the issues and tested the RNNs on more reliable data sets. The final results show high performance achieved by using RNNs, but the performance gap with previous models was reduced. Scientists have been studying the qualities of DKT after its innovative introduction. According to Khajah et al. [17], RNNs incorporate the recency effect, meaning the model aligns with human reasoning processes as it gives more importance to recent events over past ones. Since DTK’s input is the sequence of exercises a student receives in the same order, it can contextualize trial sequences. This helps us understand how the exercise sequences impact the student’s learning. DKT can predict a student’s performance on the next exercise based on their achievement history and can also determine the degree of relatedness among skills. It is not possible for the DKT system to determine if a student has fully grasped a particular concept. To tackle this problem, a new model called Dynamic Key-Value Memory Networks (DKVMN) was suggested by Zhang et al. [18]. This model has the ability to learn the relation- ships between different concepts and give an accurate assessment of a student’s understanding level for each individual concept. DKVMN is inspired by the Memory-Augmented Neural Net- work[19], a particular neural network (NN) which exploits an external memory module to enhance the ability of the model to capture long-term dependencies. DKVMN uses two mem- ory modules: a static matrix for knowledge concepts (key) and a dynamic matrix for student competencies (value) for each concept. A comparison was made between the approach used in the DKVMN model and the classical DKT model introduced by Piech et al. [12] using four distinct sets of data. The results indicate that the DKVMN model performs better in tracking the student’s knowledge and provides a comprehensive outline of the level of mastery of each concept for every student. Zhang et al. [18] have suggested an alternative method to enhance the performance of RNNs for the KT problem. They recommend analyzing the range of additional features captured by computer-based learning platforms and incorporating them into DKT models. The authors conducted an experiment to determine the impact of three factors on student performance: response time, number of attempts, and whether the first action was to request help. They then proposed a method for incorporating this information into RNN analysis. The results showed that augmenting the features considered led to better outcomes than the original DKT model. Minor changes were made to the original DKT structure to accommodate the richer set of student information and contextual insights that were deemed important for achieving improved results. Currently, deep learning models utilizing Transformers are at the forefront of solving tasks involving temporal sequences. Tackling the KT has exploited such models in several respects. A study conducted by Pandey et al. [20] focused on improving Deep Knowledge Tracing (DKT) by utilizing self-attention-based neural networks. Their approach, called SAKT, analyzes a student’s previous actions to determine which concepts they have mastered. SAKT outperforms previous deep learning-based systems in addressing the issue of sparse data, where students only interact with a few concepts, resulting in limited information. The SAKT system calculates attention weights to determine the importance of completed exercises when predicting a student’s performance on a given exercise. By visualizing these attention weights, it becomes easier to see which completed exercises the network relied on to make a prediction. This helps to identify the relevant past exercises that the student used to solve the current exercise. SAKT has been extensively tested on real-world datasets and has shown an average improvement of 4.43% in AUC compared to previous DL models. The use of feedback connection tracking time through positional encoding is not utilized by Transformers according to a study by [21]. However, recent developments have resulted in a new architecture called Transformer-XL. This architecture includes a recurrence mechanism and an updated positional encoding scheme, which allows for better capturing of longer-term dependencies compared to both RNNs and traditional Transformer models. In their work, He et al. [22] utilized the unique features of Transformer-XL to address issues arising from analyzing lengthy input exercise sequences, which have negatively impacted past DL models’ performance. The system they developed, KT-XL, was thoroughly tested on three real-world datasets and compared with previous models such as DKT, DKVMN, and SAKT. KT-XL outperformed all other models across the datasets, with an average improvement of 3.6%. New directions of Deep Knowledge Tracing research aim at improving students’ knowledge modeling. A cognitive representation of students’ skills that overcomes the common assumption of questions equivalent contributions has been proposed in [23]. A module to interpret the prediction results has also been included, to facilitate the use of DKT for the analysis of students behavior. The use of augmented knowledge have been proposed to better model students’ skills. In [24] hierarchical heterogeneous knowledge structures are modeled through knowledge-graphs, whilst Tato et al. explored the use of multi-modal data to enhance the latent representations of students [25]. Spatial and temporal features obtained from students’ activities history has been used to extract deeper hidden information in [26]. Students’ exercises are used to derive spatial information that is then connected with temporal characteristics. Results shown that using more informative representations of students’ knowledge helps in creating effective user models, leading to better predictive results than state-of-the-art algorithms for similar tasks. 4. Framework In this paper, we introduce EasyDKT - a user-friendly framework that enables users to experi- ment with DKT algorithms through a convenient command-line user interface. The framework, shown in figure 2, consists of four modules for data management, creation of a neural network model, experimentation and validation. The framework allows to extend its modules with interchangeable classes and functions to further enhance the functionalities, which can be easily selected through a configuration file. EasyDKT implements Piech et al. [12] original neural network. However, since the technologies they used are no longer available, their experiments are not reproducible. Thus, the code has been refactored by using two of the most used neural network libraries, that are TensorFlow and PyTorch. User could select the preferred library, and could compare results obtained with different libraries and settings (as we did in the experi- mental part). The configuration file outlines the necessary data for training and evaluating the model and which module to use for managing data loading and preprocessing. It also includes the DKT algorithm, its hyperparameters, and the evaluation metrics used to monitor training progress and final results. Particularly, we considered the following hyperparameters and the relative values, used to create and tune Deep Knowledge Tracing models: - Library: Deep Learning Library (Pythorch, Tensor Flow - TF) - Optimizer: RMSProp, Adam - Dropout: dropout rate; - Hidden Units: Number of LSTM hidden units; - Batch size: Number of sequences to process in a batch; - Learning rate; - Epochs: Number of epochs; - Time Window: Number of timesteps to process in a batch; The framework has meant to be used as a baseline for comparisons, and as a basis technology to build new algorithms on. For this reason we separated four modules, incapsulating the functionalities required during a Deep Knowledge Tracing process. The four modules have been further detailed in the following: Data managing The first module is devoted to prepare student’s data in the format that is required by the given library. Then, only information related to the student and the answer to a given question are considered for the processing. Figure 2: EASY-DKT framework. Neural network model creation Based on the configuration settings, the neural network model is created. The technologies are hidden in this module, which exposes an interface to communicate with the user through the configuration file. Experimentation Data is then divided in a training set to create the model, and a testing set to evaluate it. Since data is sequentially analysed, and this sequence is crucial for the deep learning models, we considered a train-test setting, rather than a more general cross-validation setting. Validation The predictive task has been evaluated in terms of the standard classification measure Area Under the Curve (AUC). Also, graphs with AUC values over epochs are generated in order to compare the stability and robustness of different configuration settings. Figure 3: On the left: the worst Case achieved by using pytorch, the Adam optimizer, no dropout, and a learning rate of 0.01. On the right: The best performance achieved with the same configuration as the worst case but reducing the learning rate to 0.0001. 5. Experiments 5.1. Data The exploited data is the ASSISTments skill-builder dataset 2009-2010, created through an online platform and made it available for free 1 . This dataset includes mathematical skill-builder problems that students can solve, and their answers are recorded. The problems presented are designed to test specific skills, and some questions may require knowledge of multiple skills. To complete the test, students must answer three consecutive questions correctly. It’s important to note that if a student uses any support or tutoring system provided by the platform itself, the question will be marked as incorrect. Additionally, students receive instant feedback to know if they answered the question correctly. The dataset contains 4217 problems and a total of 124 skills. As some students may solve the same problem, the dataset actually consists of 522,000 tuples. Each tuple comprises three parts: a student identifier (id), a skill identifier, and the answer to the problem. If a problem relates to multiple skills, there will be multiple tuples with the same student id and answer but different skill identifiers. The dataset was divided within the framework to use 3361 items for Deep Learning model training and 856 items for testing. 5.2. Results During the experimental phase, our main objective was to determine the most effective approach for implementing the DKT via PyTorch and TensorFlow deep learning libraries. To achieve this, we conducted a series of rigorous experiments, carefully adjusting various model parameters, including the dropout rate, learning rate, and optimizer (i.e. RMSProp and Adam), and ensuring a comprehensive evaluation. The model evaluation process has been carried out by computing the AUC (Area under the 1 https://sites.google.com/site/assistmentsdata/home/2009-2010-assistment-data/skill-builder-data-2009-2010 Figure 4: On the left: the worst Case achieved by using tensorflow, the Adam optimizer, no dropout, and a learning rate of 0.0001. On the right: The best performance achieved with the same configuration as the worst case but reducing the learning rate to 0.001. ROC curve) on the tests performed on the ASSISTtments skill-builder data. The ROC curve is a statistical method that gauges the accuracy of a diagnostic test across the full spectrum of potential values. Measuring the area beneath the ROC curve is a widely recognized approach for assessing machine learning models. The figures 3 and 4 display the highest and lowest performance results we obtained while conducting our experiments using the Pytorch and Tensorflow frameworks. We experimented with various configurations by adjusting the model’s hyperparameters and looking at the AUC in the range of 50 epochs. A fully overview of the conducted experiments can be found in table 1. Concerning pytorch, the worst performance was observed when Adam was the optimizer and had a learning rate of 0.01. In this case, there is an increase in the instability of the AUC value obtained at each iteration. The best performance was achieved with the same configuration but by reducing the learning rate to 0.0001. Starting from the twelfth training cycle, the AUC value stabilized at 0.84 until the end of the execution, resulting in a 2% increase in the final AUC value. With the best configuration, our framework achieves an AUC score of 0.84 comparable to the original score of 0.86 reported in [12], which was developed using outdated technologies that are now difficult to replicate. Instead, Figure 4 includes the results of our framework when using tensorflow. The lowest performance was observed with the configuration that employed Adam optimizer, did not use dropout, and had a learning rate of 0.0001. Despite using a high-performing configuration for Pytorch’s algorithm, there was no improvement in the outcome compared to TensorFlow. In fact, the AUC value remained constant at 0.76, which is worse than the previous performance. On the other hand, the highest performance was achieved with the same configuration as the worst case but with a learning rate of 0.001. In this case, changing the optimizer from RMSProp to Adam does not significantly impact performance as the AUC fluctuates between 0.78 and 0.79. We have observed that the learning rate has the most significant impact on improving the model’s performance. In addition, even when using the same configuration, models developed with TensorFlow and PyTorch libraries show different performances. This highlights the Table 1 Results of experiments with various configurations by adjusting the model’s hyperparameters. AUC is the measure used to evaluate the model configuration in 50 epochs. # Lib Opt Dropout Hiddenunits Batchsize LR TW Epochs AUC 1 Pytorch RMSProp None 200 5 0.001 100 50 0.82 2 Pytorch RMSProp 0.6 200 5 0.001 100 50 0.82 3 Pytorch Adam None 200 5 0.001 100 50 0.82 4 Pytorch Adam None 200 5 0.01 100 50 0.78 5 Pytorch Adam None 200 5 0.0001 100 50 0.84 6 TF RMSProp None 200 5 0.001 100 50 0.78 7 TF Adam None 200 5 0.001 100 50 0.79 8 TF Adam None 200 5 0.0001 100 50 0.76 differences between these two libraries, despite both aiming to achieve the same goal. Probably, different initialization parameters affect the training of the model leading to a result gap of 5%. In conclusion, the Pytorch implementation with Adam optimizer provided the best results. This highlights the importance of having an easy-to-use parameterizable workflow for DKT to identify the best configuration for a given problem quickly. 6. Conclusion and Future Works We have introduced a modular framework that makes Deep Knowledge Tracing more accessible and efficient. Our framework incorporates Pytorch and Tensorflow, the two most significant deep-learning libraries, giving users the flexibility to choose their preferred one, while incap- sulating the implementation details, so that users are not necessarily required to know how to use these libraries and their syntax. A simple configuration file has been used to define the experimental setup. EasyDKT is a first attempt to develop a simple tool for DKT, implementing all recent technologies. It has been conceived as the core of a more complex tool where more recent DKT methodologies are encapsulated as hierarchical building blocks. A case study exploring the use of EasyDKT with the ASSISTments skill-builder dataset has been presented. Particularly, we studied how the neural-network hyperparameters could affect the learning performance of the tool. Our experiments have demonstrated that when Pytorch is utilized, our framework attains state-of-the-art performance. However, some challenges are encountered when using Tensorflow. Our future work involves improving the software structure of the framework to enable the execution of various algorithms with multiple execution parameters, which can result in more accurate outcomes. Additionally, we aim to modify the NN model structure based on the preferences of experienced users. We plan to analyze specific components of the tensorflow implementation, such as tensor initialization, and compare them with those of Theano to achieve superior results (AUC = 0.78). We also propose modifying the dataset read-from-file section to make it more adaptable to different datasets and implementations. This will make the framework more versatile and usable with different datasets, structures, and formats. Finally, we plan to enhance the tool by including more advanced DKT algorithms, and by providing an intuitive interface to facilitate the user experience. Acknowledgment Gabriella Casalino acknowledges funding from the European Union PON project Ricerca e Innovazione 2014-2020, DM 1062/2021. This work is partially funded by Bando per Progetti di Ricerca GNCS 2023 - CUP E53C22001930001. G.C. is a member of the INdAM GNCS research group. References [1] A. T. Corbett, J. R. Anderson, Knowledge tracing: Modeling the acquisition of procedural knowledge, User modeling and user-adapted interaction 4 (1994) 253–278. [2] UNESCO, Beijing consensus on artificial intelligence and education, 2019. [3] D. Taibi, G. Fulantelli, V. Monteleone, D. Schicchi, L. Scifo, An innovative platform to promote social media literacy in school contexts, in: ECEL 2021 20th European Conference on e-Learning, Academic Conferences International limited, 2021, p. 460. [4] G. Lo Bosco, G. Pilato, D. Schicchi, Deepeva: a deep neural network architecture for assessing sentence complexity in italian and english languages, Array 12 (2021) 100097. [5] D. Schicchi, G. Pilato, G. L. Bosco, Attention-based model for evaluating the complexity of sentences in english language, in: 2020 IEEE 20th Mediterranean Electrotechnical Conference (MELECON), IEEE, 2020, pp. 221–225. [6] G. Casalino, G. Castellano, G. Zaza, Neuro-fuzzy systems for learning analytics, in: International Conference on Intelligent Systems Design and Applications, Springer, 2021, pp. 1341–1350. [7] G. Casalino, G. Castellano, C. Mencar, Incremental and adaptive fuzzy clustering for virtual learning environments data analysis, in: 2019 23rd International Conference Information Visualisation (IV), IEEE, 2019, pp. 382–387. [8] G. Casalino, L. Grilli, P. Limone, D. Santoro, D. Schicchi, et al., Deep learning for knowledge tracing in learning analytics: an overview., TeleXbe (2021). [9] X. Song, J. Li, T. Cai, S. Yang, T. Yang, C. Liu, A survey on deep learning based knowledge tracing, Knowledge-Based Systems 258 (2022) 110036. [10] A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, et al., Pytorch: An imperative style, high-performance deep learning library, Advances in neural information processing systems 32 (2019). [11] M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Joze- fowicz, L. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Van- houcke, V. Vasudevan, F. Viégas, O. Vinyals, P. Warden, M. Wattenberg, M. Wicke, Y. Yu, X. Zheng, TensorFlow: Large-scale machine learning on heterogeneous systems, 2015. URL: https://www.tensorflow.org/, software available from tensorflow.org. [12] C. Piech, J. Bassen, J. Huang, S. Ganguli, M. Sahami, L. J. Guibas, J. Sohl-Dickstein, Deep knowledge tracing, Advances in Neural Information Processing Systems (2015) 505––513. [13] A. T. Corbett, J. R. Anderson, Knowledge tracing: Modeling the acquisition of procedural knowledge, User modeling and user-adapted interaction (1994) 253––278. [14] O. Bulut, J. Shin, S. N. Yildirim-Erbasli, G. Gorgun, Z. A. Pardos, An introduction to bayesian knowledge tracing with pybkt, Psych 5 (2023) 770–786. [15] G. Abdelrahman, Q. Wang, B. Nunes, Knowledge tracing: A survey, ACM Computing Surveys 55 (2023) 1–37. [16] X. Xiong, S. Zhao, E. G. V. Inwegen, J. E. Beck, Going deeper with deep knowledge tracing, Proceedings of the 9th International Conference on Educational Data Mining (2016) 545––550. [17] M. Khajah, R. V. Lindsey, M. C. Mozer, How deep is knowledge tracing?, arXiv:1604.02416v2 (2016). [18] L. Zhang, X. Xiong, S. Zhao, A. Botelho, N. T. Heffernan, Incorporating rich features into deep knowledge tracing, LS ’17: Proceedings of the Fourth (2017) ACM Conference on Learning (2017) 169––172. [19] A. Graves, G. Wayne, M. Reynolds, T. Harley, I. Danihelka, A. Grabska-Barwińska, S. G. Colmenarejo, E. Grefenstette, T. Ramalho, J. Agapiou, et al., Hybrid computing using a neural network with dynamic external memory, Nature 538 (2016) 471–476. [20] S. Pandey, G. Karypis, A self-attentive model for knowledge tracing, arXiv preprint arXiv:1907.06837 (2019). [21] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, Ł. Kaiser, I. Polo- sukhin, Attention is all you need, Advances in neural information processing systems 30 (2017) 5998–6008. [22] Y. He, X. Hu, Z. Zu, G. Sun, Kt-xl: A knowledge tracing model for predicting learning performance based on transformer-xl, ACM TURC’20: Proceedings of the ACM Turing Celebration Conference - China (2020) 175–179. doi:10.1145/3393527.3393557 . [23] J. Chen, Z. Liu, S. Huang, Q. Liu, W. Luo, Improving interpretability of deep sequential knowledge tracing models with question-centric cognitive representations, arXiv preprint arXiv:2302.06885 (2023). [24] Q. Ni, T. Wei, J. Zhao, L. He, C. Zheng, Hhskt: A learner–question interactions based heterogeneous graph neural network model for knowledge tracing, Expert Systems with Applications 215 (2023) 119334. [25] A. Tato, R. Nkambou, Towards a multi-modal deep learning architecture for user modeling, in: The International FLAIRS Conference Proceedings, volume 36, 2023. [26] L. Lyu, Z. Wang, H. Yun, Z. Yang, Y. Li, Deep knowledge tracing based on spatial and temporal representation learning for learning performance prediction, Applied Sciences 12 (2022) 7188.