Using a Hierarchical Clustering Algorithm to Explore the Relationship Between Students' Program Debugging and Learning Performance Chao Hung Liu and Ting-Chia Hsu Department of Technology Application and Human Resource, National Taiwan Normal University, Taiwan ABSTRACT The programming course poses a significant challenge for students who are just starting to learn a programming language. Many beginners, upon encountering an "ERROR" message from the system, tend to give up on learning. However, there are also students who persist in overcoming difficulties, exerting continued effort to complete their code, and achieving better learning outcomes. Therefore, this study aimed to cluster students based on their behavior during debugging in a programming course. It sought to explore the impact and differences among students in terms of program success and course grades within different debugging frequency clusters. Keywords Learning Analytics, Trial and Error, Agglomerative Hierarchical Clustering 1. Introduction Computer programming courses have long been a significant challenge for students entering the field of information technology. This is because students must express their needs using computer- understandable terms, logic, and thinking, often encountering obstacles in the process (Feurzeig et al., 2011). This challenge is considered a global issue, as both introductory and advanced programming language courses face high dropout rates, creating substantial pressure on students and teachers who may have high expectations for themselves (Luxton,2016). Many students give up when confronted with multiple syntax and logic errors, indicating a potential lack of problem-solving skills and perseverance (Cheah,2020). In light of these challenges, this study primarily explored students' behavior records during the debugging process of coding, as errors represent obstacles and setbacks. Whether students can progress through these setbacks will be a key factor in their improvement and success. The study was designed to analyze and cluster students' debugging behavior data in programming courses using the Learning Management System. The research was expected to address three main research questions: • RQ1. Can errors made by students in coding be differentiated into distinct clusters? • RQ2. Is there a difference in the number of successful program runs (Success_run) among students in different programming error clusters? • RQ3. Is there a difference in course learning scores (Score) among students in different programming error clusters? 2. Related work 2.1. Learning analytics for online learning Educational Data Mining (EDM) and Educational Process Mining (EPM) are data science approaches to analyze various Learning Management Systems (LMS)(Bogarín,Cerezo& Romero,2018). This data mining approach is an important part of learning analytics. LAK-WS 2024: Joint Proceedings of LAK 2024 Workshops, March 18–19, Kyoto, Japan © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings Particularly with the rapid expansion of Massive Open Online Courses (MOOCs), the interaction between online educational resources and learners is stored in extensive databases, creating educational big data for interpretation by educators (Ruipérez et al., 2022). In numerous studies on learning analytics, there is often an exploration of dropout rates (failure rates) on online education platforms (Qian,2022). Additionally, researchers have analyzed differences in learning behaviors on a platform (Tong & Zhan, 2023), ultimately aiming to predict students' learning achievements (grades)(Li,Du & Yu,2023). These studies frequently involve classifying learners based on their learning preferences, allowing them to adapt their learning experiences according to the platform's diverse learning paths, recommendation systems, personalized learning strategies, and more. 2.2. Trial and Error in Programming Learning The learning of programming languages involves various aspects of learning, such as problem- solving skills, computational thinking, and syntax comprehension (Nouri, Zhang, Mannila & Norén, 2020). Importantly, students need to sustain high motivation and genuine engagement in coding to make progress in learning how to program (Silva & Silveira, 2020). From the perspective of beginners, the difficulty lies in the inability to break down large problems into smaller ones. When hearing some specific terms (such as recursion, arrays, etc.), students may understand how they work, but struggle to translate that understanding into actual code (Lister et al., 2004). In this context, it is crucial to encourage students to engage in trial and error, as the experiences gained from mistakes help students engage in self-reflection and can even stimulate strong motivation to find satisfactory answers and iterate through the trial-and-error process (Sicora , 2019). Since the sixteenth century, people have sought solutions to problems by encountering stimuli that lead to subsequent actions, which may involve continuous trial and error until success or, in some cases, giving up, which results in task failure (Boswell , 1947). Learning is inherently challenging, with difficulties progressing from shallow to deep. For example, students may encounter "Errors" while programming, which could contribute to exacerbating the failure rate in programming courses (Porter, Guzdial, McDowell & Simon, 2013) but can also serve as the driving force for problem solving (Noh & Lee, 2020). Programming education is often considered to have a high entry threshold, possibly due to insufficient problem-solving skills among students or ineffective use of learning materials (Cheah, 2020). Therefore, in programming courses, continuous trial and error by students is viewed as a positive behavioral performance. This practice signifies students' continuous attempts, whether in syntax or logical reasoning, until they produce results matching their expectations (Ye et al., 2022). Efforts in trial and error also help students enhance their self-efficacy, as they gain confidence in how to deal with errors and develop problem-solving skills (Ahn , Mao , Sung & Black, 2017). 2.3. Agglomerative Hierarchical Clustering in Online Courses Hierarchical clustering is an unsupervised algorithm that organizes data points into a tree-like structure on a two-dimensional plane. It groups data points and produces a hierarchical structure based on the differences between data points (Alpaydin , 2020). Agglomerative hierarchical clustering is a bottom-up hierarchical clustering method that visualizes the hierarchical structure and underlying data clustering structure (Liu, Xu, Zeng & Ren, 2021). It is also a user-friendly and popular clustering algorithm. The agglomerative hierarchical clustering process first assigns each object to its own cluster. It then uses distance or similarity measures (e.g. Euclidean distance for quantitative data, Manhattan distance for ordered but not necessarily quantitative data) or more complex methods (e.g. unweighted with arithmetic mean Pair group method (UPGMA) (Oyelade , 2019). The algorithm proceeds as follows: Based on N samples, there are initially N clusters, each cluster containing one sample. Iteratively merges the two closest clusters based on the chosen distance or similarity measure until the number of clusters is reduced to 1 or reaches a user-specified number (Cichosz , 2014). In each successive iteration, the algorithm merges the closest pair of clusters based on the similarity criterion of features between data points until all data are in one cluster (Sasirekha & Baby, 2013). Hierarchical clustering helps analyze educational big data, helping researchers identify different student learning styles, achievements, and behaviors, as well as assess individual engagement levels (Hung, Liu, Liang & Su, 2020 ; Trivedi & Patel ,2020 ; Yang, Chen, Flanagan & Ogata, 2022). 3. Method 3.1. Data mining methods This research incorporates the "Learning Behavior and Learning Strategies" dataset collected by Lu et al.(2022). This dataset predominantly consists of various actions recorded on a Learning Management System (LMS) as students engaged in learning programming. It encompasses a range of data points such as the number of errors generated, instances of code copying, frequency of code execution, and academic grades. The primary focus of the dataset is to capture the learning behaviors and strategies of students while they undertake programming tasks within the LMS environment(Lua et al., 2022). For the clustering analysis, this study focused on the "viscode.csv" dataset, specifically using the "IndentationError," "NameError," "SyntaxError," and "TypeError" fields. These four types of errors were defined as indicators of programming trial-and-error, representing the problems and difficulties students encountered while running their code. The "Viscode-success_run" and "Score" fields were used as indicators to validate the effectiveness of clustering (Table 1), serving as the basis for addressing Research Questions 2 and 3 in the study. Table 1 Program trial and error and verification field description table Program Error Field Program Error Field Validation Field Name Validation Field Name Introduction Introduction PseudoID The ID names of each Cluster This field describes student have been de- the clusters obtained identified. after grouping the program trial-and- error field. IndentationError This field describes Viscode-success_run This field describes when syntax errors the number of occur related to successful program incorrect indentation runs in the integrated development environment. NameError This field describes Score This field describes when local or global the final learning names are not found. score for the course. SyntaxError This field describes when the syntax parser encounters a syntax error. TypeError This field describes when an operation or function is applied to inappropriate types of objects. 3.2. Dataset This study compared actions taken by 452 students in a programming course using an integrated development environment, as recorded by the learning management system in the viscode.csv dataset. The data, preprocessed and de-identified, includes distinct class fields (a-i), fields for interactions with the integrated development environment, debugging attempts, execution counts, success running counts, and grades. After clustering, a new PseudoID field was introduced to represent a unique identifier for each student, also serving as an index after clustering(Ogata et al., 2017). Additionally, a Cluster field was added for conducting inter-group analysis of variances and comparing the correlation between program execution success and learning grades across clusters. 3.3. Designing Clustering Model This research employs the agglomerative hierarchical clustering method from the Python sklearn.cluster module for systematic trial-and-error clustering. To avoid the skewing of results by any single feature, normalization is performed before clustering. This is critical as it prevents any one data column from exerting undue influence on the clustering outcome and maintains the robustness of the algorithm against outliers, which could be seen as noise. Following this preparatory step, the clustering process commences. The distance metric adopted is the "Ward" method, designed to minimize the total within-cluster variance. Essentially, at each step, Ward's method selects two clusters to merge in a way that results in the least possible increase in total variance, thus preserving high similarity within the clusters. Moreover, to ensure a balanced distribution of clusters, silhouette scores are utilized to assess the quality of clustering across different numbers of clusters, ranging from 2 to 9. Referencing Table 2, the study identifies 2 as the optimal number of clusters and proceeds with further data analysis using this configuration. Table 2 Silhouette coefficient grouping score table Number of clusters cluster rating 2 0.58618 3 0.38826 4 0.35471 5 0.36514 6 0.25836 7 0.26924 8 0.22217 9 0.22226 4. Experiment results & discussion 4.1. Can errors made by students in coding be differentiated into distinct clusters? This study utilized the scipy.cluster.hierarchy library in Python to perform clustering analysis. Through this library, hierarchical clustering results were computed and visualized, as shown in Fig. 1. The chart reveals two distinct clusters with noticeable distances between them. Cluster 0 comprises 414 student records, Cluster 1 includes 38 student records. This study further applied Principal Component Analysis (PCA) to reduce the dimensions of data consisting of student IDs and their trial-and-error behaviors. By transforming the data into a two-dimensional chart, we made it straightforward to compare these behaviors against student performance. In this study, the Python library seaborn was utilized to create a heatmap (Fig. 2), which presents the average number of trial-and-error attempts by students across different clusters. The heatmap clearly shows that students in Cluster 1 had a higher frequency of programming errors, such as IndentationError, NameError, SyntaxError, and TypeError, compared to those in Cluster 0. Notably, there are significant differences in the occurrences of NameError, SyntaxError, and TypeError between Clusters 1 and 0. Cluster 1 is characterized as the "Frequent Trial-and-Error Group," while Cluster 0 is referred to as the "Regular Trial-and-Error Group" in this study. Figure 1: Hierarchical Clustering Dendrogram Figure 2: Heat map of program trial and error performance of different clusters 4.2. Is there a difference in the number of successful program runs (Success_run) among students in different programming error clusters? To address research question 2, the study analyzed the "Success_run" field, revealing through Figure 3 that students in the frequent trial-and-error cluster had higher average successful program runs compared to those in the infrequent trial-and-error cluster. This pattern suggests that students who frequently encountered system errors in the integrated development environment were more persistent, leading to more successful code executions. The study further employed an independent sample T-test, using IBM SPSS, to investigate statistical differences between the clusters in terms of successful program runs. The findings showed a statistically significant difference (t = -5.49, p < .05), where the frequent trial-and-error cluster outperformed the regular trial-and-error cluster in successful program executions. This discrepancy likely arises from the frequent trial-and-error students' resilience and continuous engagement with problem-solving and coding adjustments, in contrast to students in the regular trial-and-error cluster who may have experienced reduced motivation and task completion rates after facing setbacks. Consequently, the frequent trial-and-error cluster exhibited a significantly higher number of successful program operations compared to the regular trial-and-error cluster, highlighting their effective learning and problem-solving approach. Figure 3: Box plot of program trial and error with different groups of successfully running code] Table 3 Successfully running code and trial and error clusters were subjected to Independent Samples Test. Independent Samples Test Viscodesuccess_run N Mean SD Sig. t d frequent 414 1358.88 1016.89 .023 -5.49* 0.39 cluster regular 38 2323.34 1243.08 -4.64* cluster *. The mean difference is significant at the 0.05 level. 4.3. Is there a difference in course learning scores (Score) among students in different programming error clusters? To explore Research Question 3, the study performed a descriptive analysis of the “score” field across clusters differentiated by trial-and-error frequency. Figures 2 and 4 illustrate that clusters characterized by frequent trial-and-error tend to have higher scores than those with regular trial-and- error patterns. The box plot indicates that scores for the frequent trial-and-error group are predominantly ranged between 80 and 90 points. In contrast, the regular trial-and-error group not only exhibited wider score fluctuations (70-85 points) but also presented numerous outliers significantly below the average, with some scores approaching zero. This suggests potential dropout behaviors among certain students in the regular trial-and-error group. The independent samples T-test statistics revealed a significant disparity in course performance between the two trial-and-error groups (t = -2.69, p < .05) as shown in Table 4. This finding resonates with the insights from RQ1 and Figure 1. The study segmented program-based learning students by their trial-and-error occurrences, noting that a smaller contingent of students (38 in total) fell into the frequent trial-and-error group. Not with standing their group size, these students not only surpassed the regular trial-and-error group in the number of successful program runs but also outscored them. This underscores the significance of persistent trial-and-error efforts in learning; it is imperative for learners to persistently experiment and overcome challenges without capitulation to achieve success(Dong et al., 2019). Figure 4: Box plot of program trial and error with different groups of course score Table 4 Course score and trial and error clusters were subjected to Independent Samples Test. Independent Samples Test Score N Mean SD Sig. t d frequent 414 78.80 17.87 .020 -2.69* 0.26 cluster regular 38 86.71 10.31 -4.19* cluster *. The mean difference is significant at the 0.05 level. 5. Conclusion This study investigated the error patterns students demonstrated while debugging programs, categorizing them into two distinct groups: Cluster 0, the regular trial-and-error cluster, and Cluster 1, the frequent trial-and-error cluster. The analysis revealed that students in the frequent trial-and-error cluster not only had higher average course success rates and final scores compared to their counterparts in the regular trial-and-error cluster but also performed significantly better. These findings suggest that the debugging behaviors of different student groups can affect their learning outcomes. Educators should be cognizant of these differences and strategize appropriate responses to students' programming challenges. Moreover, when the frequency of trial-and-error attempts begins to wane, interventions such as encouragement or providing cues might be necessary to bolster students' motivation and align them with their peers(Xu, Yang, Liu & Jin, 2023). Nonetheless, it is crucial to acknowledge the limitations of the dataset and context of this study. Given that all participants were novices in programming and enrolled in the same course, the breadth of their acquired knowledge may be limited. This research advocates for the use of more varied datasets and extended observational periods. There is also considerable variation in the amount of time different students dedicate to studying programming. While some engaged with the course material for over seven hours, others may invested mere minutes. As this study lacks precise data on students' study timings, future research could employ time series analysis to discern behavioral shifts among clusters and predict how debugging frequency might influence future learning. Such an approach could yield more precise learning recommendations and enable educators to tailor their focus on the effects of trial- and-error activities on academic achievement. Acknowledgements This study is supported in part by the National Science and Technology Council in the Republic of China under contract numbers NSTC 112-2628-H-003-007-. 6. References [1] Feurzeig, W., Papert, S. A., & Lawler, B. (2011). Programming-languages as a conceptual framework for teaching mathematics. Interactive Learning Environments, 19(5), 487-501. https://doi.org/10.1080/10494820903520040 [2] Luxton-Reilly, A. (2016). Learning to program is easy. Paper presented at the Proceedings of the 2016 ACM Conference on Innovation and Technology in Computer Science Education. (pp. 284- 289). https://doi.org/10.1145/2899415.2899432 [3] Cheah, C. S. (2020). Factors contributing to the difficulties in teaching and learning of computer programming: A literature review. Contemporary Educational Technology, 12(2), ep272. https://doi.org/10.30935/cedtech/8247 [4] Bogarín, A., Cerezo, R., & Romero, C. (2018). A survey on educational process mining. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 8(1), e1230. https://doi.org/10.1002/widm.1230 [5] Ruipérez-Valiente, J. A., Staubitz, T., Jenner, M., Halawa, S., Zhang, J., Despujol, I., Rohloff, T. (2022). Large scale analytics of global and regional MOOC providers: Differences in learners’ demographics, preferences, and perceptions. Computers & Education, 180, 104426. https://doi.org/10.1016/j.compedu.2021.104426 [6] Qian, Y., Li, C. X., Zou, X. G., Feng, X. B., Xiao, M. H., & Ding, Y. Q. (2022). Research on predicting learning achievement in a flipped classroom based on MOOCs by big data analysis. Computer Applications in Engineering Education, 30(1), 222-234. https://doi.org/10.1002/cae.22452 [7] Tong, Y., & Zhan, Z. (2023). An evaluation model based on procedural behaviors for predicting MOOC learning performance: Students’ online learning behavior analytics and algorithms construction. Interactive Technology and Smart Education. https://doi.org/10.1108/ITSE-10- 2022-0133 [8] Li, S., Du, J., & Yu, S. (2023). Diversified resource access paths in MOOCs: Insights from network analysis. Computers & Education, 204, 104869. https://doi.org/10.1016/j.compedu.2023.104869 [9] Nouri, J., Zhang, L., Mannila, L., & Norén, E. (2020). Development of computational thinking, digital competence and 21st century skills when learning programming in K-9. Education Inquiry, 11(1), 1-17. https://doi.org/10.1080/20004508.2019.1627844 [10] Silva, J., & Silveira, I. (2020). A systematic review on open educational games for programming learning and teaching. International Journal of Emerging Technologies in Learning (IJET), 15(9), 156-172. https://doi.org/10.3991/ijet.v15i09.12437 [11] Lister, R., Adams, E. S., Fitzgerald, S., Fone, W., Hamer, J., Lindholm, M., Seppälä, O. (2004). A multi-national study of reading and tracing skills in novice programmers. ACM SIGCSE Bulletin, 36(4), 119-150. https://doi.org/10.1145/1041624.1041673 [12] Sicora, A. (2019). Reflective practice and learning from mistakes in social work student placement. Social Work Education, 38(1), 63-74. https://doi.org/10.1080/02615479.2018.1508567 [13] Boswell, F. P. (1947). Trial and error learning. Psychological review, 54(5), 282. https://doi.org/10.1037/h0058921 [14] Porter, L., Guzdial, M., McDowell, C., & Simon, B. (2013). Success in introductory programming: What works? Communications of the ACM, 56(8), 34-36. https://doi.org/10.1145/2492007.2492020 [15] Noh, J., & Lee, J. (2020). Effects of robotics programming on the computational thinking and creativity of elementary school students. Educational technology research and development, 68, 463-484. https://doi.org/10.1007/s11423-019-09708-w [16] Ye, Z., Jiang, L., Li, Y., Wang, Z., Zhang, G., & Chen, H. (2022). Analysis of Differences in Self- Regulated Learning Behavior Patterns of Online Learners. Electronics, 11(23), 4013. https://doi.org/10.3390/electronics11234013 [17] Ahn, J.-H., Mao, Y., Sung, W., & Black, J. B. (2017). Supporting debugging skills: Using embodied instructions in children’s programming education. Paper presented at the Society for Information Technology & teacher education international conference. (pp. 19-26) [18] Alpaydin, E. (2020). Introduction to machine learning: MIT press. [19] Liu, N., Xu, Z., Zeng, X.-J., & Ren, P. (2021). An agglomerative hierarchical clustering algorithm for linear ordinal rankings. Information Sciences, 557, 170-193. https://doi.org/10.1016/j.ins.2020.12.056 [20] Oyelade, J., Isewon, I., Oladipupo, O., Emebo, O., Omogbadegun, Z., Aromolaran, O., Olawole, O. (2019). Data clustering: Algorithms and its applications. Paper presented at the 2019 19th International Conference on Computational Science and Its Applications (ICCSA). (pp. 71-81). IEEE. 10.1109/ICCSA.2019.000-1 [21] Cichosz, P. (2014). Data mining algorithms: explained using R: John Wiley & Sons. [22] Sasirekha, K., & Baby, P. (2013). Agglomerative hierarchical clustering algorithm-a. International Journal of Scientific and Research Publications, 83(3), 83. [23] Hung, H.-C., Liu, I.-F., Liang, C.-T., & Su, Y.-S. (2020). Applying educational data mining to explore students’ learning patterns in the flipped learning approach for coding education. Symmetry, 12(2), 213. https://doi.org/10.3390/sym12020213 [24] Trivedi, S., & Patel, N. (2020). Clustering Students Based on Virtual Learning Engagement, Digital Skills, and E-learning Infrastructure: Applications of K-means, DBSCAN, Hierarchical, and Affinity Propagation Clustering. Sage Science Review of Educational Technology, 3(1), 1-13. https://journals.sagescience.org/index.php/ssret/article/view/6 [25] Yang, A. C., Chen, I. Y., Flanagan, B., & Ogata, H. (2022). How students’ self-assessment behavior affects their online learning performance. Computers and Education: Artificial Intelligence, 3, 100058. https://doi.org/10.1016/j.caeai.2022.100058 [26] Lua, O. H., Huang, A. Y., Flanaganc, B., Ogata, H., & Yang, S. J. A. (2022) Quality Data Set for Data Challenge: Featuring 160 Students' Learning Behaviors and Learning Strategies in a Programming Course. Proceedings of the 30th International Conference on Computers in Education. Kuala Lumpur City,Malaysia. [27] Ogata, H., Oi, M., Mohri, K., Okubo, F., Shimada, A., Yamada, M., Hirokawa, S. (2017). Learning analytics for e-book-based educational big data in higher education. Smart sensors at the IoT frontier, 327-350. https://doi.org/10.1007/978-3-319-55345-0_13 [28] Dong, Y., Marwan, S., Catete, V., Price, T., & Barnes, T. (2019). Defining tinkering behavior in open-ended block-based programming assignments. Paper presented at the Proceedings of the 50th ACM Technical Symposium on Computer Science Education. (pp. 1204-1210). https://doi.org/10.1145/3287324.3287437 [29] Xu, W., Yang, L. Y., Liu, X., & Jin, P. N. (2023). Examining the effects of different forms of teacher feedback intervention for learners' cognitive and emotional interaction in online collaborative discussion: A visualization method for process mining based on text automatic analysis. Education and Information Technologies, 1-27. https://doi.org/10.1080/10494820903520040