Automated Identification of Relevant Worked Examples for Programming Problems Muntasir Hoq1 , Atharva Patil1 , Kamil Akhuseyinoglu2 , Bita Akram1 and Peter Brusilovsky2 1 North Carolina State University 2 University of Pittsburgh Abstract Novice programmers can greatly benefit from using worked examples demonstrating the implementation of programming concepts that are challenging to them. Although large repositories of effective worked examples have been generated by CS education experts, one main challenge is identifying the most relevant worked example in accordance with the particular programming problem assigned to a student and their unique challenges in understanding and solving the problem. Previous studies have explored similar example recommendation approaches. Our work takes a novel approach by employing deep learning code representation models to extract code vectors, capturing both syntactic and semantic similarities among programming examples. Motivated by the challenge of offering relevant and personalized examples to programming students, our approach focuses on similarity assessment approaches and clustering techniques to identify similar code problems, examples, and challenges. We aim to provide more accurate and contextually relevant recommendations to students based on their individual learning needs. Providing tailored support to students in real-time facilitates better problem-solving strategies and enhances students’ learning experiences, contributing to the advancement of programming education. Keywords problem-solving support, program examples, code structure, code similarities 1. Introduction curated by computer science (CS) education experts, a funda- mental challenge persists: How to identify the most relevant Example-based problem solving is the cornerstone of intel- worked example tailored to each student’s specific learning ligent tutoring systems (ITSs) within the programming do- needs and the nuances of the programming problem at hand main [1]. When students encounter difficulties in problem- with a scalable and reliable approach. solving, such systems aim to provide relevant examples to In response to this challenge, we aim to develop an au- aid in comprehension and resolution. Traditionally, select- tomated recommender system to recommend the most rel- ing these examples has relied heavily on domain experts, evant problems and examples to students when they face a time-consuming and resource-intensive process, particu- difficulty solving programming problems. We undertake larly as the volume of learning content expands. However, a vector-based approach where we embed problems and alternative approaches have emerged, seeking to link prob- examples into vector representations, preserving their struc- lems and examples dynamically without expert intervention. tural and semantic information. To this end, we leverage Content-based methodologies, such as keyword-based ap- a deep learning code representation model, Subtree-based proaches, analyze surface-level similarities but often lack Neural Network (SANN) [9], to extract nuanced similarities the depth necessary to discern truly relevant content [2, 3]. among programming problems and examples. We applied In contrast, knowledge-based approaches investigate the this model to problems and examples available in PCEX [6]. semantic understanding of content, offering higher-quality We aim to provide contextually relevant recommenda- links by focusing on the underlying concepts [4, 5]. tions that enhance students’ problem-solving abilities and The motivation for exploring innovative example selec- enrich their learning experiences in programming education. tion methodologies arises from recognizing the significant Using the extracted vectors from SANN, we recommend stu- benefits novice programmers can gain from worked exam- dents with similar worked examples for a problem based ples that illustrate challenging programming concepts. Hos- on vector similarity. To demonstrate the effectiveness of seini et al. [6] demonstrated the engagement and perfor- our recommendation system, we evaluated it using Top-N mance benefits of directly connecting worked examples and accuracy metrics (N = 1, 3, and 5). This measures how often similar completion problems into a “bundle” by a tool called the correct example, as labeled by experts, appears within Program Construction Examples (PCEX). A more recent the top N recommendations. Additionally, we used cluster- study [7] demonstrated that semantic similarity between ing techniques such as DBSCAN and hierarchical clustering connected problems and examples is one of the keys to bet- to group similar problems and examples, aiming to reduce ter problem-solving performance and persistence achieved the manual effort required by experts. Our results suggest when this connection is provided by the domain expert. that this method effectively identifies similar problems and In cases where worked examples and problems are not ex- examples, enabling us to provide guidance and support to plicitly linked, it is essential to provide clear guidance to students facing similar challenges. Using these advanced students, such as recommending semantically similar exam- techniques, we aim to bridge the gap between the vast repos- ples following a failed problem-solving attempt [8]. Despite itory of programming examples and problems and the lack the availability of extensive repositories of such examples of manual support for selecting resources according to the specific needs of individual students, thus fostering more CSEDM’24: 8th Educational Data Mining in Computer Science Education effective and personalized learning experiences in program- Workshop, July 14, 2024, Atlanta, GA ming education [10]. Envelope-Open mhoq@ncsu.edu (M. Hoq); aspatil2@ncsu.edu (A. Patil); kaa108@pitt.edu (K. Akhuseyinoglu); bakram@ncsu.edu (B. Akram); peterb@pitt.edu (P. Brusilovsky) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 2. Related Work The concept of automatically connecting similar content items traces back to the pioneering work of Mayes and Kibby in the realm of educational hypertext [2, 3]. Initially, similarity-based navigation centered on keyword-level sim- ilarity, but due to its limited quality, this approach has since been supplanted by more robust semantic linking method- ologies, often referred to as intelligent linking. One such approach is metadata-based linking, which computes simi- Figure 1: Example 1 from the same bundle larity measures across various facets of metadata to generate higher-quality links [11]. In recent years, the focus has shifted towards ontology- based linking, particularly within the hypermedia research community. Ontology-based linking involves indexing doc- uments with ontology terms and then leveraging ontological structures to identify similar documents [5, 12]. Although early investigations primarily focused on hypermedia appli- cations, the educational domain saw a subsequent adoption of ontology-based linking methodologies [13]. In the programming domain, similarity assessment has Figure 2: Example 2 from the same bundle mainly relied on content-level information. For example, Gross et al. [14] linked Java programming contents based on the similarity of their Abstract Syntax Trees (ASTs), which encompassed the entire body of examples and problems. However, this approach may overlook finer-grained simi- larities in smaller code fragments [15]. Recent approaches have explored concept-based similarity methodologies [16], that is, representing examples and problems as vectors of domain concepts and measuring similarity between these vectors. Further work attempted to calculate ontology-based similarity metrics for programming items [8]. With the advent of automated program representation techniques that use deep learning methodologies [9], the extraction of syntactic and semantic structural information from programming code snippets has gained traction. These Figure 3: Example 3 from a different bundle techniques offer the potential to alleviate the reliance on experts for designing isomorphic problem-example pairs, instead enabling the discovery of relevant examples within learning materials. While such similarity approaches hold lems and examples that span 13 topics, including Variables promise across a range of code-related programming prob- and Operations, If-Else statements, Boolean Expressions, lems [17, 9], our study focuses on code comprehension prob- For Loops, and Nested Loops. The problems and examples lems, wherein students are tasked with predicting the out- in the dataset are organized into 52 bundles, with an aver- put or final value of variables from a given program code age of 4 bundles per topic. Bundles start with a single fully rather than on program composition problems requiring worked out example and are followed by 1 to 3 similar prob- code writing tasks. lems. On average, each bundle has 1.35 problems. We used the current content organization in PCEX, which represents expert knowledge, as the gold standard for the evaluation 3. Dataset of content recommendation approaches [18]. Figures 1 and 2 show two program examples from the same bundle under In this study, we used a Python programming dataset the Variable and Operations topic. Figure 3 shows another sourced from the PCEX system [6]. PCEX offers online example of a different bundle within the same topic to show access to working code examples and small completion the difference in the bundle structures. problems referred to as “challenges”. To increase learners’ motivation and improve overall learning outcomes, prob- lems and examples are further organized into bundles by 4. Methodology domain experts, who group together problems and examples that target similar programming constructs and patterns. In this study, we used the Subtree-based Attention Neural This combination approach was validated in previous re- Network (SANN) [9] to encode programs into vector rep- search [6, 7], demonstrating its value across various metrics resentations. We computed the cosine similarity between and stressing the importance of connecting learning and these vectors to recommend the closest examples for a given assessment on the level of specific programming patterns problem. To further analyze and group similar problems in addition to their traditional integration on the level of and examples, we employed clustering techniques such as broader course topics. DBSCAN and Hierarchical clustering. The PCEX dataset comprises 123 programming code prob- SANN is our primary model for encoding programs in vector format. It has demonstrated its efficiency in captur- Table 1 ing both syntactic and semantic information from programs Top-N accuracy for recommending worked examples in an interpretable manner and understanding the intricate Top-N Accuracy (%) code structure [9, 17, 19]. SANN operates by encoding the Top-1 70.97 source code into vector representations using subtrees ex- Top-3 83.10 tracted from the Abstract Syntax Tree (AST) representation Top-5 87.32 of the code. These subtrees undergo a two-way embedding process in which each subtree and its constituent nodes are individually embedded. The resulting embeddings are then The dataset has problems/challenges and examples bun- merged into a single embedded vector. Subsequently, the dled together based on similarity (these are called bundles), embedded vectors from both approaches are concatenated and different bundles are combined under different topics and passed through a time-distributed, fully connected layer, by the experts. Therefore, the dataset shows a hierarchy generating subtree vectors that incorporate both node-level of topics and bundles. Each topic has some bundles, and and subtree-level information. each bundle has some similar challenges and examples. If a After the generation of subtree vectors, an attention neu- student faces difficulty in a problem, an example from the ral network is employed to condense all subtree vectors into same bundle will be recommended. We trained the SANN a singular source code vector. The attention mechanism model using only the topic information for challenges and assigns scalar weights to each subtree vector, facilitating the examples, intentionally omitting any bundle information. aggregation of all subtree vectors into a weighted average. Although bundles encapsulate more detailed and granular- These weights are determined through a normalized inner level information about the program structures, our objec- product between each subtree vector and a global attention tive was for SANN to learn this granular insight exclusively vector, followed by applying a softmax function to ensure from the superficial and abstract topic information. We aim that the weights sum up to 1. The resulting weighted av- to enable SANN to generalize effectively across diverse pro- erage of subtree vectors, as determined by the attention gram structures by training on topics alone and trying to mechanism, encapsulates the entire source code snippet. reconstruct underlying bundles based on their similarity The SANN model leverages attention weights to prioritize in program pattern and structure to evaluate their effec- the most important subtrees when generating the source tiveness compared to expert-identified bundles. There are code vector. We recursively extract all subtrees from an 13 topics and 52 bundles in the dataset. After the training, AST, ensuring comprehensive coverage of the code struc- we tested the trained model on the test data to predict the ture during the encoding process. associated topics. SANN showed a testing accuracy of 88%. Following the extraction of code vectors, we calculated Afterward, we extract the source code vectors for the prob- cosine similarity to find the closest example for a given lems and examples from SANN further study. Finally, we problem for the recommendation. Furthermore, we utilized investigated the effectiveness of these vectors in forming various clustering techniques, including DBSCAN and Hier- groups of similar examples and problems that can serve as a archical clustering, to group similar problems and examples. recommending tool. We calculated the cosine similarity to DBSCAN is adept at identifying clusters of varying shapes find the closest example for a given problem. If the closest and sizes while being robust to noise, while hierarchical example is also from the same original expert-identified clustering provides insights into the clustering structure bundle as the challenge, the recommended example is cor- through dendrogram analysis. Using these techniques, we rect. We calculated the Top-N accuracy, where N = 1, 3, aim to comprehensively explore the similarity structure and 5 as stated in Table 1. The experimental result sug- within our dataset and facilitate the identification of cohe- gests that our recommendation can effectively find similar sive groups of programming problems and examples. worked examples for a given problem when a student is facing difficulty. However, we speculate that this accuracy can be improved with a bigger dataset to train SANN since 5. Experiments and Results the current dataset has only 123 challenges and examples, where the average number of examples per problem is only 5.1. Code Vector Extraction and Example 0.73. We want to investigate the impact of dataset size on Recommendation improving performance in the future. We employed the Python AST parser1 to parse Python pro- We further hypothesize that since the bundles represent gramming code into ASTs. For SANN training, we par- very similar challenges and examples, the corresponding titioned our dataset into 80% training data and 20% test- vectors should show these similarities by being closer to ing data. During the splitting process, we ensured that the programs of the same bundle than others. The same hy- no bundle was excluded from the training set to retain all pothesis is applicable to topics. However, the topics contain the diverse structural variations for comprehensive train- slightly less similar challenges and examples. Hence, the ing. The embedding size for both subtree-based and node- vectors of a topic should be close to each other but not as based embeddings was set to 64, chosen from {64, 128, and close as those of a bundle. According to our hypothesis, the 256}. Consequently, each source code vector was of size 128. vectors in these bundles and topics should show patterns Throughout the model training phase, we employed the in their tightness. Tightness refers to the average distance Adamax optimizer [9] with a default learning rate of 0.001 between points of a bundle or topic. To calculate the tight- to learn the weights of the matrices. The batch size was ness, we used the expert labels from the dataset as the gold set to 32, and the maximum number of epochs was capped standard to show the effectiveness of our method and verify at 200, with an early stopping patience of 20, to prevent the hypothesis. For each topic/bundle, first, we calculated overfitting of the model. the pairwise distances for all the points within it. Then, we calculated the mean of these pairwise distances, which is 1 https://docs.python.org/3/library/ast.html the tightness within the vectors of the topic/bundle. Figure are much closer to each other than those in a topic. Finally, 4 shows the scatter plot of the bundles using PCA = 2. the average distance between all dataset samples was found to be 2.7 units. This result suggests that samples belonging to the same bundle are semantically very similar to each other; samples belonging to the same topic might have more variations than bundles but still more similar than any other sample from other topics in the course. Figure 4: Bundle clusters Figure 7: Average tightness of topics and bundles 5.2. Clustering Similar Examples We investigated the effectiveness of multiple clustering ap- proaches in identifying bundles of similar problems and examples. Firstly, we employed DBSCAN clustering for top- ics, given its capability to handle irregularly shaped clusters when the number of clusters (topics) is unknown and dif- ferent structured problems and examples can be set under Figure 5: Average tightness of topics the same topic. Setting the epsilon value to 0.85 and the minimum points parameter to 2, we successfully identified 13 distinct clusters based on topics, with only 2 points clas- sified as noise. This is because we assume each topic cluster must have at least 2 points, and if some point is not in the vicinity of any other, it is best to consider it as noise rather than part of some cluster. Figure 8 shows the scatter plot visualization that highlights the nonspherical nature of the clusters, indicating their irregular shapes. Figure 6: Average tightness of bundles To verify our hypothesis, we calculated the degree of tightness for (1) vectors of the same bundle, (2) vectors of the same topic, and (3) all vectors in the data set (entire course). Figure 5 shows the topic-level tightness, and Figure 6 shows the bundle-level tightness. Here, we can see that bundles have lower distances, whereas topics have higher distances. We plotted the mean degree of tightness for topics, bundles, and the whole dataset in Figure 7 to get Figure 8: Topic clusters using DBSCAN a clearer comparative view. For topics and bundles, the average tightness measures for all individual topics and bundles were calculated. The average tightness of bundles We calculated the accuracy of the topic clustering using was found to be 0.4 units, and the average tightness of topics DBSCAN by determining a clustering error, which was as- was found to be 0.8. This implies that points of a bundle sessed by comparing the assigned clusters to gold standard clusters based on predefined topics. Specifically, we calcu- 6. Discussion lated how many items were incorrectly assigned to clusters compared to their actual topic labels. The clustering error In this study, we tried to address the long-standing challenge demonstrated an average of 11.69% (std dev 0.15) over all of dynamically recommending relevant programming exam- the problems and examples. The highest clustering error ples tailored to individual student needs within the context for a topic was 44%. The topic with 44% error was the topic of computer science (CS) education. Our approach centered “Strings.” This can be considered an outlier because the code on leveraging the Subtree-based Neural Network (SANN) for string programs is likely similar to other topics where model to extract nuanced syntactic and semantic similarities some string operations are also required. It is important to among programming examples, thus facilitating the identi- note that three topics, including “For Loops,” “Nested Loops,” fication of analogous examples crucial for problem-solving and “Lists,” were assigned to the same cluster. We found support. In this study, SANN was trained only on the topic that these topics are very similar in structure and have over- information of the examples. However, the dataset used lapping, i.e., using loops to traverse a list, hierarchical For also contains bundle information, where similar problems Loops in Nested Loops. and examples are bundled together under a topic. We used Hierarchical clustering was utilized for bundles inside topic-level information about the problems and examples topics as it allows for the exploration of hierarchical struc- to get deeper structural insight using SANN, which helps tures within the data and accommodates scenarios where to identify similar worked examples for struggling students. the number of clusters is uncertain. DBSCAN may not be Using the extracted code vectors, we recommend students ideal for this because the plotted points for bundles are un- with worked examples for a given problem based on vector likely to have irregular shapes, since problems and examples similarity. The experiment suggests that the recommenda- inside a bundle tend to be the most similar. Hierarchical tion has an accuracy of 70.97%, 83.10%, and 87.32% with the clustering starts by treating each sample as a separate clus- Top-1, Top-3, and Top-5 recommendations, respectively. ter. Then, it repeatedly executes the following two steps: (i) In addition, we show the effectiveness of these vectors identify the two clusters that are closest together, and (ii) by showing the tightness of each topic and bundle in the merge the two most similar clusters. This iterative process course. The results suggest that the bundles represent very continues until all the clusters have merged together. similar problems and examples, as reflected by the proxim- The dendrogram from hierarchical clustering illustrated ity of their corresponding vectors. In contrast, the topics that samples sharing similar bundles and topics clustered contain multiple bundles with slightly less similar problems closely together, with their parent clusters predominantly and examples. Consequently, the vectors within a topic aligning with their respective topics. In addition, we as- are close to each other but not as tightly clustered as those sessed the closest sample for each item, categorizing them within a bundle. We further employed advanced clustering based on bundle name and topic similarity. Based on this techniques, including DBSCAN and hierarchical clustering, closest sample data, we evaluated the number of items for to effectively group similar programming problems and ex- which their closest sample had (1) the same bundle name, (2) amples and alleviate the expert effort in bundling similar a different bundle name but the same topic, and (3) a differ- problems and examples. This outcome highlights the initial ent bundle name and topic. As evident in Figure 9, 43.9% had effectiveness of our approach in organizing and understand- their closest sample from the same bundle and 30.9% had ing the structural and semantic relationships inherent in their closest pair from the same topic. However, there were programming education datasets. However, with the lim- 25.2% samples whose closest pair was from a different topic ited training data (the current dataset has only 123 problems and bundle. This result suggests that samples of the same and examples, where the average number of examples per topic are closer and contained within the same local region. problem is only 0.73), our clustering and performance did In addition, samples belonging to the same bundle are even not fully align with the expert labels. We hypothesize that closer to each other. However, discrepancies between the minor structural changes and overlapping topics in smaller clustering results and expert labels emerged when problems problems and examples could be captured more accurately and examples involved multiple topics or multiple bundles, with a larger dataset. Exploring this possibility is an inter- for example, the use of loops in For Loops, Nested Loops, esting direction for future research. List, and Strings. The significance of our study lies in addressing a key challenge in CS education: identifying relevant and contex- tually appropriate programming examples [1]. By offering a methodological framework for dynamically recommending personalized examples, our study provides a scalable solu- tion to the resource-intensive process of example selection traditionally reliant on domain experts. Our approach ef- fectively connects the extensive collection of programming examples with the unique needs of individual students, im- proving programming education by promoting more effi- cient and personalized learning experiences [6, 20]. Figure 9: Hierarchical clustering summary 7. Limitations and Future Work There are a few limitations that need to be addressed in this study. Firstly, SANN was trained on the topics associ- ated with each problem and examples labeled by experts. This current setup limits our ability to use a vast corpus of worked examples and programming problems that are not [3] J. T. Mayes, M. R. Kibby, H. Watson, Strathtutor©: labeled with topics. In the future, we want to eliminate this The development and evaluation of a learning-by- limitation by training the SANN model in a topic-agnostic browsing system on the macintosh, Computers & way. We propose training the model not on explicit topic Education 12 (1988) 221–229. information but instead on the underlying code structure [4] K. R. Koedinger, A. T. Corbett, C. Perfetti, The using an encoder-decoder architecture. In this approach, the knowledge-learning-instruction framework: Bridging encoder would process the source code to generate a latent the science-practice chasm to enhance robust student representation that captures the structural and semantic learning, Cognitive Science 36 (2012) 757–798. nuances of the code. The decoder would then reconstruct [5] L. Carr, W. Hall, S. Bechhofer, C. Goble, Concep- the code from this latent representation. This unsupervised tual linking: ontology-based open hypermedia, in: learning method aims to enable the model to understand Proceedings of the 10th International Conference on and encode the intricate structure of the code more effec- World Wide Web, 2001, pp. 334–342. tively, leading to better generalization and more accurate [6] R. Hosseini, K. Akhuseyinoglu, P. Brusilovsky, recommendations based on structural similarities rather L. Malmi, K. Pollari-Malmi, C. Schunn, T. Sirkiä, Im- than predefined topic labels. proving engagement in program construction exam- Additionally, when we explored clustering techniques, ples for learning python programming, International we observed that some worked examples are similar, though Journal of Artificial Intelligence in Education 30 (2020) they are from different topics. It happens because some top- 299–336. ics overlap with previously learned topics. For example, [7] K. Akhuseyinoglu, A. Klašnja-Milićević, P. Brusilovsky, when dealing with List problems, they might require knowl- The impact of connecting worked examples and com- edge of loops. In such cases, in the future, we might consider pletion problems for introductory programming prac- sub-categories of these bundles to recommend previous top- tice, in: European Conference on Technology En- ics when necessary based on the difficulty progression of hanced Learning (EC-TEL 2024), Lecture Notes in Com- a student. For example, if a student struggles with travers- puter Science, Springer International Publishing, 2024. ing a list due to difficulties using loops, they would benefit [8] R. Hosseini, P. Brusilovsky, A study of concept-based from revisiting similar examples that focus on loops from similarity approaches for recommending program ex- previously covered topics. amples, New Review of Hypermedia and Multimedia Another future direction of this work is to make the rec- 23 (2017) 161–188. ommendations more personalized based on student knowl- [9] M. Hoq, S. R. Chilla, M. Ahmadi Ranjbar, edge. We want to track students’ learning at various stages P. Brusilovsky, B. Akram, SANN: Programming code of the course and incorporate that information in recom- representation using attention neural network with mending examples for the current problems they face. The optimized subtree extraction, in: Proceedings of the tracing of student learning can also be on a topic level. If a 32nd ACM International Conference on Information student faces difficulty in a particular topic, this can be im- and Knowledge Management, 2023, pp. 783–792. portant information along with the problem code structure [10] K. Muldner, J. Jennings, V. Chiarelli, A review of for the recommender system. In addition, struggling with worked examples in programming activities, ACM the same topic can also act as an alarm for instructors, indi- Transactions on Computing Education 23 (2022) 1–35. cating that a student needs personalized intervention and [11] D. Tudhope, C. Taylor, Navigation via similarity: au- support. We also intend to add some baselines from the lit- tomatic linking based on semantic closeness, Informa- erature to do a study to show the comparative effectiveness tion Processing & Management 33 (1997) 233–242. of our framework in the future. [12] M. Crampes, S. Ranwez, Ontology-supported and ontology-driven conceptual navigation on the world wide web, in: Proceedings of the 11th ACM on Hyper- 8. Conclusion text and Hypermedia, 2000, pp. 191–199. [13] P. Dolog, N. Henze, W. Nejdl, Logic-based open hy- In this study, we used the Subtree-based Neural Network permedia for the semantic web, in: Proceedings of (SANN) model to recommend relevant programming ex- the International Workshop on Hypermedia and the amples tailored to individual student needs in computer Semantic Web, Hypertext 2003 Conference, 2003. science (CS) education. Through clustering techniques, in- [14] S. Gross, B. Mokbel, B. Hammer, N. Pinkwart, How to cluding DBSCAN and hierarchical clustering, we effectively select an example? a comparison of selection strate- organized the structural and semantic relationships of prob- gies in example-based learning, in: Proceedings of lems and examples to guide the recommendation of similar the Intelligent Tutoring Systems: 12th International practices to programming students. Our approach offers a Conference, ITS 2014, Springer, 2014, pp. 340–347. scalable solution to the resource-intensive process of exam- [15] G. Weber, A. Mollenberg, Elm-pe: A knowledge-based ple selection, providing contextually appropriate learning programming environment for learning lisp. (1994). resources tailored to individual student needs. [16] R. Hosseini, P. Brusilovsky, Example-based problem solving support using concept analysis of program- References ming content, in: Proceedings of the Intelligent Tutor- ing Systems: 12th International Conference, ITS 2014, [1] P. Brusilovsky, C. Peylo, Adaptive and intelligent web- Springer, 2014, pp. 683–685. based educational systems, International Journal of [17] M. Hoq, Y. Shi, J. Leinonen, D. Babalola, C. Lynch, Artificial Intelligence in Education 13 (2003) 156–169. T. Price, B. Akram, Detecting chatgpt-generated code [2] M. Kibby, J. Mayes, Towards intelligent hypertext, submissions in a cs1 course using machine learning Hypertext: Theory into Practice (1989) 164–172. models, in: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 1, 2024, pp. 526–532. [18] A. J. Sabet, I. Alpizar-Chacon, J. Barria-Pineda, P. Brusilovsky, S. Sosnovsky, S. Sosnovsky, P. Brusilovsky, A. Lan, et al., Enriching intelli- gent textbooks with interactivity: When smart content allocation goes wrong, in: Proceedings of the 4th International Workshop on Intelligent Textbooks, volume 3192, 2022. [19] M. Hoq, J. Vandenberg, B. Mott, J. Lester, N. Norouzi, B. Akram, Towards attention-based automatic miscon- ception identification in introductory programming courses, in: Proceedings of the 55th ACM Technical Symposium on Computer Science Education V. 2, 2024, pp. 1680–1681. [20] M. Hoq, P. Brusilovsky, B. Akram, Analysis of an explainable student performance prediction model in an introductory programming course, in: Proceedings of the 16th International Conference on Educational Data Mining, 2023, pp. 79–90.