Predicting student performance based on Lecture Materials data using Neural Network Models Sukrit Leelaluk1, Tsubasa Minematsu1, Yuta Taniguchi1, Fumiya Okubo1, Atsushi Shimada1 1 Graduate School of Information Science and Electrical Engineering, Kyushu University, Japan Abstract Student Performance Prediction is essential for learning analysis of the students' learning behavior to discovering at-risk students for the early invention to support students. This study transforms the students' reading behavior into a two-dimensional matrix input based on each lecture material's reading behavior. The matrix input will be updated by accumulating the value for each week for performance prediction week by week. The multilayer perceptron neural network is employed to receive the matrix input and give feedback as a student's criteria consist of at-risk or no-risk students. This study considers the accuracy of a model considering between on contents information and weekly information. We also investigate the switching of learning materials' order, the feature importance of the reading operation on an event stream, and the difference in reading behavior between at-risk and no-risk students. These can help the instructors for an early invention to support at-risk students. Keywords 1 Learning Analytics, Performance prediction, Neural Networks 1. Introduction Performance prediction is education mining data by collecting students' learning activity logs and analysis. Then, we receive output to analyze students' learning behavior. The purpose of performance prediction is to increase students' academic success or discover at-risk students early and invent them against drop-out or failure [1, 2]. Recently, Machine Learning (ML) and Deep Learning (DL) have been used for performance prediction in previous research. This is because ML and DL can learn and recognize the data pattern with a vast volume and predict the outcome with better accuracy [3]. Many pieces of research are proposing to predict students' performance early by using students' activities logs based on each week to create a time sequence. However, collecting the data from the education systems each week maybe find the break weeks or the same courses on each semester have a period difference. These problems will have the possibility to make accuracy drop. Furthermore, instructors can adjust the teaching styles of the courses every time. Therefore, using the past activities data as training data to predict the students' performance with different formats at the current time makes the prediction performance drop possible. This situation is similar with using a single-course predictive model to predict another course [8]. From the reason above, this study decided to transform the data to be based on lecture materials for students' performance prediction since the lecture materials are a fundamental object used to hold the lecture. In addition, this paper relies on the neural network (NN) to derive the high-performance classification model for predicting students' reading behavior. This can support instructors to identify at-risk students during the period of courses and invent students early. Proceedings of the 4th Workshop on Predicting Performance Based on the Analysis of Reading Behavior, March 21-22, 2022 https://sites.google.com/view/lak22datachallenge ©️ 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) 2. Related Works The prediction of students' performance becomes the challenge of learning analytics. The previous literature was recently analyzed to present different techniques to predict students' performance, such as statistical methods, machine learning, or deep learning. Moises et al. [4] used ML and multilayer perceptron NN (MLP) to analyze the data on the massive learning management system (LMS) on 10%, 25%, 33%, and 50% of the course length for detect at-risk, fail and excellent students. Okubo et al. [5, 6] studied a method to predict students' final grades on multiple courses using a recurrent neural network (RNN) to analyze the time-series data. Murata et al. [7] used the knowledge distillation RNN model to compress the original model to become a student model for early detection of at-risk students. Finally, Conjin et al. [16] predict students' final grades and the predictor variables of 17 blended courses using standard and logistic regression on each week from the Moodle LMS data. From these previous works of literature, we found they predicted students' performance by considering the event stream of reading behavior and other factors based on conventional weekly. In contrast to previous work, we present a prediction by considering the reading behavior on each learning material that instructors used in the lecture. This study will transform reading behavior data into the matrix input that can explain each student's reading information. 3. Datasets and Preparation 3.1. Data Collection The clickstream data was received from the International Conference on Learning Analytics & Knowledge 2022 Data Challenge [9, 10]. Dataset consists of the two types of lectures (A and B) from onsite classes in 2019 and online classes in 2020 in the same institution. The summary of the detail of the experimental data shown in Table 1. Table 1 The summary of the detail of the four courses used in experiments. Course Name The length of course The number of The number of lecture students materials A – 2019 8 Weeks 50 21 A – 2020 7 Weeks 62 12 B – 2019 8 Weeks 163 8 B – 2020 7 Weeks 93 9 3.2. Features The features are received from the operation on the event stream that concludes the total operations and reading time computed by using the openLA library [11]. The features are used in experiments shown in Appendix A. 3.3. Data Preprocessing In this study, the raw data are transformed into two-dimensional matrices for inserted into the model [12, 13]. The matrix consists of m rows with the size of the number of lecture materials (C), and n columns with the size of the number of features (F) is displayed in figure 1. Where, its element Xmn represents the value of the clickstream on each action and lecture material. Figure 1: The Example of Matrix Input by 'C' and 'F' refers to the order of contents and feature, respectively. The contents information matrix can input the feature in order of learning materials based on the previous data. Then, this matrix can accumulate values of features depending on students' reading behavior independent of time series. This can protect instructors from switching and not using content for a lecture. So, we can use the previous data to predict students' performance that differs from the model depending on time. Since instructors do not use a learning material to lecture students on that week, the value of the data has a significant change that may cause prediction performance to change. From the observation data in table 1, we found that the amount of the contents is different. We thus made the matrix for each course is adjusted to have rows and columns equivalent using the matrix size of a course with most lecture materials on the same lecture-type as the main. If the missing value is found, this will be imputed by the average of each feature's value of students on the same course. All value on the matrix is normalized by robust standardization for rescaling features using statistics that are robust to outliers. 3.4. Model This study creates MLP prediction models for each type of lecture is trained by the learning behavior in all lecture weeks. The MLP will consist of two hidden layers with the activation function are ReLu. The output layer had two units for classification between no-risk and at-risk students with the activation function is softmax. The optimizer function is Adam for the loss. The matrix will be adjusted to become the vector of size m x n by listing m rows next to each other and fetching them into the same model. Figure 2: The concepts of input a matrix on the MLP independent model. The experiment will predict the students' performance every week during the course period. The matrix will be updated by accumulating data week by week until the course's last week that is shown in figure 2. 3.5. Evaluation Criteria The evaluation criteria will class the student's performance as a binary classification. The class will separate students who receive grades A, B, and C into the no-risk group and students who receive grades D and F into the at-risk group [7]. The matrix's performance is evaluated using the accuracy. Table 2 shows the distribution of students' final grades in each course. Table 2 The summary of the distribution of students’ final grade. Course Name A B C D F A – 2019 24 6 4 6 10 A – 2020 22 24 6 3 7 B – 2019 26 104 30 2 2 B – 2020 37 38 12 2 4 4. Experimental & Results 4.1. The efficiency of the model considering content information This experiment presents the model's prediction result that considers weekly behavior with content information (Weekly w/ Content) in contrast with weekly behavior without content information (Weekly w/o Content). Many researchers predicted students' performance using the data based on conventional weekly in previous. For instance, Okubo et al. [5, 6] predicted students' final grades using RNN for each week. The weekly prediction data is collected from the reading behavior data from the starting of a lecture of the current week until the starting of next week's lecture every week. Then, this is input into the prediction model as a vector size 1 row with n columns that uses features like a content information model. We evaluate the proposed method where A-2019 and B-2019 courses are used as the training data, and A-2020 and B-2020 courses are used as the test data. From observing the correspondence of lecture materials on both lecture types from the raw data, an instructor adjusted the teaching styles of the courses by changing the number of contents. We found that A-2019 and B-2019 courses have ten and one content canceled for usage on the lecture in 2020, respectively. Then, A-2020 and B-2020 courses have two and one new content used in the lecture, respectively. Therefore, the matrix input for training and test set is adjusted to have the row and column equivalent by using the matrix size of a course with most lecture materials on the same lecture-type as the main. The missing value thus will be imputed by the mean of each feature's value on each student. However, an A-type lecture on the 2019 course has 21 lecture materials, including 12 main contents and 9 summarized from main contents. Then, an instructor left the main content only and added new content in 2020. In addition, many students chose to read the main contents more than summarized contents significantly on students' reading behavior on A-2019 lectures. We thus transform the training matrix of A-2019 courses to have the same size as the test matrix of the A-2020 course. Figure 3: The graph between the model considering the content information and conventional weekly information on (a) A-2020 and (b) B-2020 courses. The result found that the weekly w/o content model is better than the weekly w/ content model on the lecture A-type. However, these are similar to the lecture B-type. Then, we found that both prediction models had high accuracy on a lecture B-type than a lecture A-type shown in figure 3. When we observe the result prediction on each model, we found that the weekly w/o content model can classify the final grade A and B students as no-risk students better than the weekly w/ content model on both types of lectures. However, the identity of the final grade C students is very well on both models in a B-type lecture more than an A-type lecture. The detection of the at-risk students of both models on A-type lecture is better than B-type lecture every week. This is because the training data of at-risk students on A-type lecture (16 students) is more than B-type lecture (4 students). Therefore, that affects the model better to analyze the pattern of at-risk students on A-type lecture. The pattern of reading behavior of both no-risk and at-risk students will be shown in section 4.2. Figure 4a: The graphs show the switching of learning materials' order of a weekly w/ content model on (a) A-2020 and (b) B-2020 courses. Figure 4b: The graphs show the switching of weekly order of a weekly w/o content model on (a) A-2020 and (b) B-2020 courses. Furthermore, we simulated the example situation if instructors change the learning materials' order to make information change in an actual class in Figures 4a and 4b. We conducted these experiments by switching the order of learning materials in the weekly w/ content and the weekly information on weekly w/o content on test data every week and inputting the same model on previous experiments. We found the switch of the contents on a weekly w/ content model made an accuracy drop on some weeks on a course period, especially on an A-type lecture. The reason is that a weekly w/ content model considers each content’s reading behavior as the main. Thus, switching the order materials makes the values of features change. In addition, instructors and students maybe give importance to each lecture equivalent each year. This made the separation of each content on each row on the matrix have new data that a weekly w/ content model has ever been trained to cause an accuracy drop (Figure 4a). On the other hand, the switch of weekly information on a weekly w/o content model made an accuracy change on some weeks. However, this model can give an accuracy better than the weekly w/ content model. We found that the weekly w/o content model considers the operation count and reading time on each week as central. Let us consider students' behavior, such as reading time. We can find that the no-risk group takes time to read regularly. However, the at-risk group did not spend so much time reading that caused a weekly w/o content model to identify a class of students better than a weekly w/ content model (Figure 4b). 4.2. Analysis of the feature important In the previous experiment, we conducted the matrix and vector input using features explained in section 3.2. We thus try to reduce the matrix size by exploring the feature importance by using the Random Forest classifier. This method allows us to know the insight into informative features for classification [14]. 18 Relative feature importance (%) 16 A-type B-type 14 12 10 8 6 4 2 0 Features Figure 5: The feature importance of the A-type and B-type lectures by Random forest. This experiment was conducted using the vector data from the weekly w/o content to explore the feature importance. We found the 'NEXT,' 'PREV,' 'READING_TIME_MIN,' and 'TOTAL_ACTION' are gave the most feature importance score as a percentage on both types of lecture. These features are the most helpful operation for reading learning materials. Furthermore, this will analyze students' reading behavior on each final grade by using a heatmap in figure 6. The heatmap of reading behavior was computed by the average of operations from the students on each final grade. The rich yellow color means that a feature has the most action and the purple color means that a feature has the poor action. The no-risk students had the most operation on an event stream, which means no-risk students spent much attention and time for reading learning materials on many pages. On the other hand, the at-risk students did not focus on reading lecture materials as no-risk students, which caused these students got to fail. We thus try to adjust the matrix on weekly w/ content model by reducing the matrix size by using only the high score features composed of 'NEXT,' 'PREV,' 'READING_TIME_MIN,' and 'TOTAL_ACTION' compared with the current features. We found that the reduction of matrix size raised prediction accuracy on an A-type course due to the essential features, making the model more easily recognize the difference between at-risk and no-risk students. However, a B-type course did not have any difference when using the high score features for prediction since the data of at-risk students was few. (Figure 7) Figure 6: The heatmap of reading behavior average of students on courses (a) A-2020 and (b) B-2020 who receive grades A, C, and F, respectively. Figure 7: The graph between the selected features on (a) A-2020 and (b) B-2020 courses. 5. Discussion From the result of the experiment, we found that predicting students' performance by considering learning content data can classification students who have a trend to fail or at-risk students if we have much reading behavior pattern of at-risk data sufficient. In addition, we used a heatmap to investigate the reading behavior on each week between no-risk students and at-risk students for a survey on how it affected the model performance. After the 3rd and 4th week of lecture, we found that no-risk students had a few actions for reading learning materials. On the other hand, at-risk students can continue reading and reviewing many learning materials until the end of the course (Figure 8). This causes a model to separate the reading behavior between at-risk and no-risk students. The reading behavior of at-risk students has a minor operation on the reading of contents materials, which means these students spent a little time reading. We thus plot graphs of the average reading time of all students those criteria by final grade on figure 9. Again, we found that the students in the no-risk group spent much time reading more than at-risk students significantly, according to the result on a heatmap. Let us consider the reading behavior with the prediction result based on the content information model. The instructors can invent students who can fail on half of the course period, such as 3rd or 4th week on the quarter course. Considering the early timing for interventions with past related literature, such as the author from [5] and [6], the model can also invent at-risk students around half of the course. Figure 8: The heatmap of reading behavior average of students on B-2020 courses on 2nd, 4th, 7th of (a) final grade A students and (b) final grade F students. Figure 9: The graph of reading behavior time average of all students' final grades on (a) A-2020 and (b) B-2020 courses. 6. Conclusion & Future Works This study employed the multilayer perceptron neural network to classify students' performance as at- risk and no-risk students for support instructors for early interventions. The experimental data were collected from the two different onsite and online lectures in different years. The model for identifying at-risk and no-risk students is based on the clickstream event on each learning material and transformed into a two-dimensional matrix. The features are generated by using the operation of students' reading behavior. We then contrast the model's accuracy with the model considering on weekly information model. From the result, a weekly w/ content model can give accuracy similar to a weekly w/o model. A weekly w/ content model model is a model that considers students' reading behavior depending on the content that is independents of time. However, a weekly w/ content switch made an accuracy drop since students may not focus on each content equivalent. Whereas the weekly switch information on the weekly w/o contents model can give a better accuracy since this model focuses on the total reading behavior each week. Next, we try to reduce the size of the matrix by using the random forest to investigate the feature importance. We found that both types of lectures' high score operation features were the same as a result. These features are the most necessary action for reading learning materials on the system. Finally, we found the behavior of at-risk students gave little attention to studying and reviewing the learning materials on features on the matrix that can inverstigate on the heatmap after half of period so the teachers should invent to protect these students from failing before half of the course period. In the future task, we plan to explore how to find an attention weight of each content on the same type of lecture due to instructors' changing lecture style and students' reading behavior. The purpose is to improve the weekly w/ contest model because this can still perform well and explain almost all students' reading behavior well. Then, we want to optimize the model to have higher accuracy by using other deep learning models, such as the attention model [15], and extract new features from students' reading behavior. Acknowledgements This work was supported by JST SPRING Grant Number JPMJSP2136, JST AIP Grant Number JPMJCR19U1, and JSPS KAKENHI Grand Number JP18H04125, Japan. References [1] Baradwaj, B. K., & Pal, S. (2012). Mining educational data to analyze students' performance. arXiv preprint arXiv:1201.3417. [2] Marbouti, F., Diefes-Dux, H.A., & Madhavan, K.P. (2016). Models for early prediction of at-risk students in a course using standards-based grading. Comput. Educ., 103, 1-15. [3] Albreiki, B., Zaki, N., & Alashwal, H. (2021). A Systematic Literature Review of Student’ Performance Prediction Using Machine Learning Techniques. Education Sciences, 11(9), 552. doi:10.3390/educsci11090552 [4] González, M.R., Ruíz, M.D., & Ortin, F. (2021). Massive LMS log data analysis for the early prediction of course-agnostic student performance. Comput. Educ., 163, 104108. [5] Okubo, F., Yamashita, T., Shimada, A., & Konomi, S. (2017). Students' performance prediction using data of multiple courses by recurrent neural network. In A. F. Mohd Ayub, A. Mitrovic, J-C. Yang, S. L. Wong, & W. Chen (Eds.), Proceedings of the 25th International Conference on Computers in Education, ICCE 2017 - Main Conference Proceedings (pp. 439-444). (Proceedings of the 25th International Conference on Computers in Education, ICCE 2017 - Main Conference Proceedings). Asia-Pacific Society for Computers in Education. [6] Okubo, F., Yamashita, T., Shimada, A., Taniguchi, Y., & Shin’ichi, K. (2018). On the prediction of students’ quiz score by recurrent neural network. CEUR Workshop Proceedings, 2163. [7] Murata, R., Minematsu, T., Shimada, A. (2021). Early Detection of At-risk Students based on Knowledge Distillation RNN Models. Educational Data Mining 2021 (EDM 2021). [8] López-Zambrano, J., Lara, J. A., & Romero, C. (2020). Towards Portability of Models for Predicting Students’ Final Performance in University Courses Starting from Moodle Logs. Applied Sciences, 10(1), 354. doi:10.3390/app10010354 [9] Flanagan, B., Ogata, H. (2017). Integration of Learning Analytics Research and Production Systems While Protecting Privacy. In International Conference on Computers in Education (ICCE2017) (pp.333-338). [10] Ogata, H., Oi, M., Mohri, K., Okubo, F., Shimada, A., Yamada, M., Wang, J., & Hirokawa, S. (2017). Learning analytics for e-book-based educational big data in higher education. In Smart Sensors at the IoT Frontier (pp. 327-350). Springer, Cham. [11] Murata, R., Minematsu, T., Shimada, A. (2020). OpenLA: Library for Efficient E-book Log Analysis and Accelerating Learning Analytics. In International Conference on Computer in Education (ICCE 2020), 301-306. [12] Daniusis, P., Vaitkus, P. (2008). Neural Network with Matrix Inputs. Informatica, Lith. Acad. Sci.. 19. 477-486. 10.15388/Informatica.2008.225. [13] Dalal, A., Imtiaz, A., Ebrahim, A. (2021). Deep learning based network traffic matrix prediction. International Journal of Intelligent Networks, 2, 46-56. [14] Altaf, S., Soomro, W.J., & Rawi, M.I. (2019). Student Performance Prediction using Multi-Layers Artificial Neural Networks: A Case Study on Educational Data Mining. Proceedings of the 2019 3rd International Conference on Information System and Data Mining. [15] Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention is all you need. In Advances in neural information processing systems (pp. 5998-6008). [16] Conijn, R., Snijders, C., Kleingeld, A., & Matzat, U. (2017). Predicting Student Performance from LMS Data: A Comparison of 17 Blended Courses Using Moodle LMS. IEEE Transactions on Learning Technologies, 10, 17-29. Appendix A. Features related to reading behavior The following table presents the variables describing the reading operation collected from the students' reading behavior on the e-book system. The description is referred to the dataset received from LAK22 Data Challenge. Feature Operation Name Description F1 OPEN The amount of opened the e-book F2 CLOSE The amount of closed the e-book F3 NEXT The amount of went to next page F4 PREV The amount of backed to previous page F5 PAGE_JUMP The amount of jumped to a particular page F6 ADD BOOKMARK The amount of added a bookmark F7 BOOKMARK_JUMP The amount of jumped to a particular page from the bookmark F8 ADD MARKER The amount of added a marker F9 ADD MEMO The amount of added a memo F10 CHANGE MEMO The amount of edited an existing memo F11 DELETE BOOKMARK The amount of deleted a bookmark F12 DELETE MARKER The amount of deleted a marker F13 DELETE_MEMO The amount of deleted a memo F14 SEARCH The amount of searched for something within the e-book F15 SEARCH_JUMP The amount of jumped to a particular page from the search results F16 GETIT The amount of clicked the "GETIT" button in the e-book F17 NOTGETIT The amount of clicked the "NOTGETIT" button in the e-book F18 TOTAL_ACTION The total of operation on the e-book F19 READING_TIME_MIN The reading time of each lecture materials (minutes)