=Paper= {{Paper |id=Vol-3796/short4 |storemode=property |title=Automated Identification of Relevant Worked Examples for Programming Problems |pdfUrl=https://ceur-ws.org/Vol-3796/CSEDM-24_paper_9053.pdf |volume=Vol-3796 |authors=Muntasir Hoq,Atharva Patil,Kamil Akhuseyinoglu,Bita Akram,Peter Brusilovsky |dblpUrl=https://dblp.org/rec/conf/edm/HoqPAAB24 }} ==Automated Identification of Relevant Worked Examples for Programming Problems== https://ceur-ws.org/Vol-3796/CSEDM-24_paper_9053.pdf
                         Automated Identification of Relevant Worked Examples for
                         Programming Problems
                         Muntasir Hoq1 , Atharva Patil1 , Kamil Akhuseyinoglu2 , Bita Akram1 and Peter Brusilovsky2
                         1
                             North Carolina State University
                         2
                             University of Pittsburgh


                                           Abstract
                                           Novice programmers can greatly benefit from using worked examples demonstrating the implementation of programming concepts
                                           that are challenging to them. Although large repositories of effective worked examples have been generated by CS education experts,
                                           one main challenge is identifying the most relevant worked example in accordance with the particular programming problem assigned
                                           to a student and their unique challenges in understanding and solving the problem. Previous studies have explored similar example
                                           recommendation approaches. Our work takes a novel approach by employing deep learning code representation models to extract
                                           code vectors, capturing both syntactic and semantic similarities among programming examples. Motivated by the challenge of offering
                                           relevant and personalized examples to programming students, our approach focuses on similarity assessment approaches and clustering
                                           techniques to identify similar code problems, examples, and challenges. We aim to provide more accurate and contextually relevant
                                           recommendations to students based on their individual learning needs. Providing tailored support to students in real-time facilitates
                                           better problem-solving strategies and enhances students’ learning experiences, contributing to the advancement of programming
                                           education.

                                           Keywords
                                           problem-solving support, program examples, code structure, code similarities



                         1. Introduction                                                                                              curated by computer science (CS) education experts, a funda-
                                                                                                                                      mental challenge persists: How to identify the most relevant
                         Example-based problem solving is the cornerstone of intel-                                                   worked example tailored to each student’s specific learning
                         ligent tutoring systems (ITSs) within the programming do-                                                    needs and the nuances of the programming problem at hand
                         main [1]. When students encounter difficulties in problem-                                                   with a scalable and reliable approach.
                         solving, such systems aim to provide relevant examples to                                                       In response to this challenge, we aim to develop an au-
                         aid in comprehension and resolution. Traditionally, select-                                                  tomated recommender system to recommend the most rel-
                         ing these examples has relied heavily on domain experts,                                                     evant problems and examples to students when they face
                         a time-consuming and resource-intensive process, particu-                                                    difficulty solving programming problems. We undertake
                         larly as the volume of learning content expands. However,                                                    a vector-based approach where we embed problems and
                         alternative approaches have emerged, seeking to link prob-                                                   examples into vector representations, preserving their struc-
                         lems and examples dynamically without expert intervention.                                                   tural and semantic information. To this end, we leverage
                         Content-based methodologies, such as keyword-based ap-                                                       a deep learning code representation model, Subtree-based
                         proaches, analyze surface-level similarities but often lack                                                  Neural Network (SANN) [9], to extract nuanced similarities
                         the depth necessary to discern truly relevant content [2, 3].                                                among programming problems and examples. We applied
                         In contrast, knowledge-based approaches investigate the                                                      this model to problems and examples available in PCEX [6].
                         semantic understanding of content, offering higher-quality                                                      We aim to provide contextually relevant recommenda-
                         links by focusing on the underlying concepts [4, 5].                                                         tions that enhance students’ problem-solving abilities and
                            The motivation for exploring innovative example selec-                                                    enrich their learning experiences in programming education.
                         tion methodologies arises from recognizing the significant                                                   Using the extracted vectors from SANN, we recommend stu-
                         benefits novice programmers can gain from worked exam-                                                       dents with similar worked examples for a problem based
                         ples that illustrate challenging programming concepts. Hos-                                                  on vector similarity. To demonstrate the effectiveness of
                         seini et al. [6] demonstrated the engagement and perfor-                                                     our recommendation system, we evaluated it using Top-N
                         mance benefits of directly connecting worked examples and                                                    accuracy metrics (N = 1, 3, and 5). This measures how often
                         similar completion problems into a “bundle” by a tool called                                                 the correct example, as labeled by experts, appears within
                         Program Construction Examples (PCEX). A more recent                                                          the top N recommendations. Additionally, we used cluster-
                         study [7] demonstrated that semantic similarity between                                                      ing techniques such as DBSCAN and hierarchical clustering
                         connected problems and examples is one of the keys to bet-                                                   to group similar problems and examples, aiming to reduce
                         ter problem-solving performance and persistence achieved                                                     the manual effort required by experts. Our results suggest
                         when this connection is provided by the domain expert.                                                       that this method effectively identifies similar problems and
                         In cases where worked examples and problems are not ex-                                                      examples, enabling us to provide guidance and support to
                         plicitly linked, it is essential to provide clear guidance to                                                students facing similar challenges. Using these advanced
                         students, such as recommending semantically similar exam-                                                    techniques, we aim to bridge the gap between the vast repos-
                         ples following a failed problem-solving attempt [8]. Despite                                                 itory of programming examples and problems and the lack
                         the availability of extensive repositories of such examples                                                  of manual support for selecting resources according to the
                                                                                                                                      specific needs of individual students, thus fostering more
                         CSEDM’24: 8th Educational Data Mining in Computer Science Education                                          effective and personalized learning experiences in program-
                         Workshop, July 14, 2024, Atlanta, GA                                                                         ming education [10].
                         Envelope-Open mhoq@ncsu.edu (M. Hoq); aspatil2@ncsu.edu (A. Patil);
                         kaa108@pitt.edu (K. Akhuseyinoglu); bakram@ncsu.edu (B. Akram);
                         peterb@pitt.edu (P. Brusilovsky)
                                       © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                       Attribution 4.0 International (CC BY 4.0).



CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
2. Related Work
The concept of automatically connecting similar content
items traces back to the pioneering work of Mayes and
Kibby in the realm of educational hypertext [2, 3]. Initially,
similarity-based navigation centered on keyword-level sim-
ilarity, but due to its limited quality, this approach has since
been supplanted by more robust semantic linking method-
ologies, often referred to as intelligent linking. One such
approach is metadata-based linking, which computes simi-           Figure 1: Example 1 from the same bundle
larity measures across various facets of metadata to generate
higher-quality links [11].
   In recent years, the focus has shifted towards ontology-
based linking, particularly within the hypermedia research
community. Ontology-based linking involves indexing doc-
uments with ontology terms and then leveraging ontological
structures to identify similar documents [5, 12]. Although
early investigations primarily focused on hypermedia appli-
cations, the educational domain saw a subsequent adoption
of ontology-based linking methodologies [13].
   In the programming domain, similarity assessment has            Figure 2: Example 2 from the same bundle
mainly relied on content-level information. For example,
Gross et al. [14] linked Java programming contents based on
the similarity of their Abstract Syntax Trees (ASTs), which
encompassed the entire body of examples and problems.
However, this approach may overlook finer-grained simi-
larities in smaller code fragments [15]. Recent approaches
have explored concept-based similarity methodologies [16],
that is, representing examples and problems as vectors of
domain concepts and measuring similarity between these
vectors. Further work attempted to calculate ontology-based
similarity metrics for programming items [8].
   With the advent of automated program representation
techniques that use deep learning methodologies [9], the
extraction of syntactic and semantic structural information
from programming code snippets has gained traction. These
                                                                   Figure 3: Example 3 from a different bundle
techniques offer the potential to alleviate the reliance on
experts for designing isomorphic problem-example pairs,
instead enabling the discovery of relevant examples within
learning materials. While such similarity approaches hold          lems and examples that span 13 topics, including Variables
promise across a range of code-related programming prob-           and Operations, If-Else statements, Boolean Expressions,
lems [17, 9], our study focuses on code comprehension prob-        For Loops, and Nested Loops. The problems and examples
lems, wherein students are tasked with predicting the out-         in the dataset are organized into 52 bundles, with an aver-
put or final value of variables from a given program code          age of 4 bundles per topic. Bundles start with a single fully
rather than on program composition problems requiring              worked out example and are followed by 1 to 3 similar prob-
code writing tasks.                                                lems. On average, each bundle has 1.35 problems. We used
                                                                   the current content organization in PCEX, which represents
                                                                   expert knowledge, as the gold standard for the evaluation
3. Dataset                                                         of content recommendation approaches [18]. Figures 1 and
                                                                   2 show two program examples from the same bundle under
In this study, we used a Python programming dataset                the Variable and Operations topic. Figure 3 shows another
sourced from the PCEX system [6]. PCEX offers online               example of a different bundle within the same topic to show
access to working code examples and small completion               the difference in the bundle structures.
problems referred to as “challenges”. To increase learners’
motivation and improve overall learning outcomes, prob-
lems and examples are further organized into bundles by            4. Methodology
domain experts, who group together problems and examples
that target similar programming constructs and patterns.           In this study, we used the Subtree-based Attention Neural
This combination approach was validated in previous re-            Network (SANN) [9] to encode programs into vector rep-
search [6, 7], demonstrating its value across various metrics      resentations. We computed the cosine similarity between
and stressing the importance of connecting learning and            these vectors to recommend the closest examples for a given
assessment on the level of specific programming patterns           problem. To further analyze and group similar problems
in addition to their traditional integration on the level of       and examples, we employed clustering techniques such as
broader course topics.                                             DBSCAN and Hierarchical clustering.
   The PCEX dataset comprises 123 programming code prob-              SANN is our primary model for encoding programs in
vector format. It has demonstrated its efficiency in captur-      Table 1
ing both syntactic and semantic information from programs         Top-N accuracy for recommending worked examples
in an interpretable manner and understanding the intricate                           Top-N    Accuracy (%)
code structure [9, 17, 19]. SANN operates by encoding the                            Top-1       70.97
source code into vector representations using subtrees ex-                           Top-3       83.10
tracted from the Abstract Syntax Tree (AST) representation                           Top-5       87.32
of the code. These subtrees undergo a two-way embedding
process in which each subtree and its constituent nodes are
individually embedded. The resulting embeddings are then             The dataset has problems/challenges and examples bun-
merged into a single embedded vector. Subsequently, the           dled together based on similarity (these are called bundles),
embedded vectors from both approaches are concatenated            and different bundles are combined under different topics
and passed through a time-distributed, fully connected layer,     by the experts. Therefore, the dataset shows a hierarchy
generating subtree vectors that incorporate both node-level       of topics and bundles. Each topic has some bundles, and
and subtree-level information.                                    each bundle has some similar challenges and examples. If a
   After the generation of subtree vectors, an attention neu-     student faces difficulty in a problem, an example from the
ral network is employed to condense all subtree vectors into      same bundle will be recommended. We trained the SANN
a singular source code vector. The attention mechanism            model using only the topic information for challenges and
assigns scalar weights to each subtree vector, facilitating the   examples, intentionally omitting any bundle information.
aggregation of all subtree vectors into a weighted average.       Although bundles encapsulate more detailed and granular-
These weights are determined through a normalized inner           level information about the program structures, our objec-
product between each subtree vector and a global attention        tive was for SANN to learn this granular insight exclusively
vector, followed by applying a softmax function to ensure         from the superficial and abstract topic information. We aim
that the weights sum up to 1. The resulting weighted av-          to enable SANN to generalize effectively across diverse pro-
erage of subtree vectors, as determined by the attention          gram structures by training on topics alone and trying to
mechanism, encapsulates the entire source code snippet.           reconstruct underlying bundles based on their similarity
The SANN model leverages attention weights to prioritize          in program pattern and structure to evaluate their effec-
the most important subtrees when generating the source            tiveness compared to expert-identified bundles. There are
code vector. We recursively extract all subtrees from an          13 topics and 52 bundles in the dataset. After the training,
AST, ensuring comprehensive coverage of the code struc-           we tested the trained model on the test data to predict the
ture during the encoding process.                                 associated topics. SANN showed a testing accuracy of 88%.
   Following the extraction of code vectors, we calculated        Afterward, we extract the source code vectors for the prob-
cosine similarity to find the closest example for a given         lems and examples from SANN further study. Finally, we
problem for the recommendation. Furthermore, we utilized          investigated the effectiveness of these vectors in forming
various clustering techniques, including DBSCAN and Hier-         groups of similar examples and problems that can serve as a
archical clustering, to group similar problems and examples.      recommending tool. We calculated the cosine similarity to
DBSCAN is adept at identifying clusters of varying shapes         find the closest example for a given problem. If the closest
and sizes while being robust to noise, while hierarchical         example is also from the same original expert-identified
clustering provides insights into the clustering structure        bundle as the challenge, the recommended example is cor-
through dendrogram analysis. Using these techniques, we           rect. We calculated the Top-N accuracy, where N = 1, 3,
aim to comprehensively explore the similarity structure           and 5 as stated in Table 1. The experimental result sug-
within our dataset and facilitate the identification of cohe-     gests that our recommendation can effectively find similar
sive groups of programming problems and examples.                 worked examples for a given problem when a student is
                                                                  facing difficulty. However, we speculate that this accuracy
                                                                  can be improved with a bigger dataset to train SANN since
5. Experiments and Results                                        the current dataset has only 123 challenges and examples,
                                                                  where the average number of examples per problem is only
5.1. Code Vector Extraction and Example                           0.73. We want to investigate the impact of dataset size on
     Recommendation                                               improving performance in the future.
We employed the Python AST parser1 to parse Python pro-              We further hypothesize that since the bundles represent
gramming code into ASTs. For SANN training, we par-               very similar challenges and examples, the corresponding
titioned our dataset into 80% training data and 20% test-         vectors should show these similarities by being closer to
ing data. During the splitting process, we ensured that           the programs of the same bundle than others. The same hy-
no bundle was excluded from the training set to retain all        pothesis is applicable to topics. However, the topics contain
the diverse structural variations for comprehensive train-        slightly less similar challenges and examples. Hence, the
ing. The embedding size for both subtree-based and node-          vectors of a topic should be close to each other but not as
based embeddings was set to 64, chosen from {64, 128, and         close as those of a bundle. According to our hypothesis, the
256}. Consequently, each source code vector was of size 128.      vectors in these bundles and topics should show patterns
Throughout the model training phase, we employed the              in their tightness. Tightness refers to the average distance
Adamax optimizer [9] with a default learning rate of 0.001        between points of a bundle or topic. To calculate the tight-
to learn the weights of the matrices. The batch size was          ness, we used the expert labels from the dataset as the gold
set to 32, and the maximum number of epochs was capped            standard to show the effectiveness of our method and verify
at 200, with an early stopping patience of 20, to prevent         the hypothesis. For each topic/bundle, first, we calculated
overfitting of the model.                                         the pairwise distances for all the points within it. Then, we
                                                                  calculated the mean of these pairwise distances, which is
1
    https://docs.python.org/3/library/ast.html
the tightness within the vectors of the topic/bundle. Figure     are much closer to each other than those in a topic. Finally,
4 shows the scatter plot of the bundles using PCA = 2.           the average distance between all dataset samples was found
                                                                 to be 2.7 units. This result suggests that samples belonging
                                                                 to the same bundle are semantically very similar to each
                                                                 other; samples belonging to the same topic might have more
                                                                 variations than bundles but still more similar than any other
                                                                 sample from other topics in the course.




Figure 4: Bundle clusters




                                                                 Figure 7: Average tightness of topics and bundles




                                                                 5.2. Clustering Similar Examples
                                                                 We investigated the effectiveness of multiple clustering ap-
                                                                 proaches in identifying bundles of similar problems and
                                                                 examples. Firstly, we employed DBSCAN clustering for top-
                                                                 ics, given its capability to handle irregularly shaped clusters
                                                                 when the number of clusters (topics) is unknown and dif-
                                                                 ferent structured problems and examples can be set under
Figure 5: Average tightness of topics                            the same topic. Setting the epsilon value to 0.85 and the
                                                                 minimum points parameter to 2, we successfully identified
                                                                 13 distinct clusters based on topics, with only 2 points clas-
                                                                 sified as noise. This is because we assume each topic cluster
                                                                 must have at least 2 points, and if some point is not in the
                                                                 vicinity of any other, it is best to consider it as noise rather
                                                                 than part of some cluster. Figure 8 shows the scatter plot
                                                                 visualization that highlights the nonspherical nature of the
                                                                 clusters, indicating their irregular shapes.




Figure 6: Average tightness of bundles


   To verify our hypothesis, we calculated the degree of
tightness for (1) vectors of the same bundle, (2) vectors of
the same topic, and (3) all vectors in the data set (entire
course). Figure 5 shows the topic-level tightness, and Figure
6 shows the bundle-level tightness. Here, we can see that
bundles have lower distances, whereas topics have higher
distances. We plotted the mean degree of tightness for
topics, bundles, and the whole dataset in Figure 7 to get        Figure 8: Topic clusters using DBSCAN
a clearer comparative view. For topics and bundles, the
average tightness measures for all individual topics and
bundles were calculated. The average tightness of bundles          We calculated the accuracy of the topic clustering using
was found to be 0.4 units, and the average tightness of topics   DBSCAN by determining a clustering error, which was as-
was found to be 0.8. This implies that points of a bundle        sessed by comparing the assigned clusters to gold standard
clusters based on predefined topics. Specifically, we calcu-      6. Discussion
lated how many items were incorrectly assigned to clusters
compared to their actual topic labels. The clustering error       In this study, we tried to address the long-standing challenge
demonstrated an average of 11.69% (std dev 0.15) over all         of dynamically recommending relevant programming exam-
the problems and examples. The highest clustering error           ples tailored to individual student needs within the context
for a topic was 44%. The topic with 44% error was the topic       of computer science (CS) education. Our approach centered
“Strings.” This can be considered an outlier because the code     on leveraging the Subtree-based Neural Network (SANN)
for string programs is likely similar to other topics where       model to extract nuanced syntactic and semantic similarities
some string operations are also required. It is important to      among programming examples, thus facilitating the identi-
note that three topics, including “For Loops,” “Nested Loops,”    fication of analogous examples crucial for problem-solving
and “Lists,” were assigned to the same cluster. We found          support. In this study, SANN was trained only on the topic
that these topics are very similar in structure and have over-    information of the examples. However, the dataset used
lapping, i.e., using loops to traverse a list, hierarchical For   also contains bundle information, where similar problems
Loops in Nested Loops.                                            and examples are bundled together under a topic. We used
   Hierarchical clustering was utilized for bundles inside        topic-level information about the problems and examples
topics as it allows for the exploration of hierarchical struc-    to get deeper structural insight using SANN, which helps
tures within the data and accommodates scenarios where            to identify similar worked examples for struggling students.
the number of clusters is uncertain. DBSCAN may not be            Using the extracted code vectors, we recommend students
ideal for this because the plotted points for bundles are un-     with worked examples for a given problem based on vector
likely to have irregular shapes, since problems and examples      similarity. The experiment suggests that the recommenda-
inside a bundle tend to be the most similar. Hierarchical         tion has an accuracy of 70.97%, 83.10%, and 87.32% with the
clustering starts by treating each sample as a separate clus-     Top-1, Top-3, and Top-5 recommendations, respectively.
ter. Then, it repeatedly executes the following two steps: (i)       In addition, we show the effectiveness of these vectors
identify the two clusters that are closest together, and (ii)     by showing the tightness of each topic and bundle in the
merge the two most similar clusters. This iterative process       course. The results suggest that the bundles represent very
continues until all the clusters have merged together.            similar problems and examples, as reflected by the proxim-
   The dendrogram from hierarchical clustering illustrated        ity of their corresponding vectors. In contrast, the topics
that samples sharing similar bundles and topics clustered         contain multiple bundles with slightly less similar problems
closely together, with their parent clusters predominantly        and examples. Consequently, the vectors within a topic
aligning with their respective topics. In addition, we as-        are close to each other but not as tightly clustered as those
sessed the closest sample for each item, categorizing them        within a bundle. We further employed advanced clustering
based on bundle name and topic similarity. Based on this          techniques, including DBSCAN and hierarchical clustering,
closest sample data, we evaluated the number of items for         to effectively group similar programming problems and ex-
which their closest sample had (1) the same bundle name, (2)      amples and alleviate the expert effort in bundling similar
a different bundle name but the same topic, and (3) a differ-     problems and examples. This outcome highlights the initial
ent bundle name and topic. As evident in Figure 9, 43.9% had      effectiveness of our approach in organizing and understand-
their closest sample from the same bundle and 30.9% had           ing the structural and semantic relationships inherent in
their closest pair from the same topic. However, there were       programming education datasets. However, with the lim-
25.2% samples whose closest pair was from a different topic       ited training data (the current dataset has only 123 problems
and bundle. This result suggests that samples of the same         and examples, where the average number of examples per
topic are closer and contained within the same local region.      problem is only 0.73), our clustering and performance did
In addition, samples belonging to the same bundle are even        not fully align with the expert labels. We hypothesize that
closer to each other. However, discrepancies between the          minor structural changes and overlapping topics in smaller
clustering results and expert labels emerged when problems        problems and examples could be captured more accurately
and examples involved multiple topics or multiple bundles,        with a larger dataset. Exploring this possibility is an inter-
for example, the use of loops in For Loops, Nested Loops,         esting direction for future research.
List, and Strings.                                                   The significance of our study lies in addressing a key
                                                                  challenge in CS education: identifying relevant and contex-
                                                                  tually appropriate programming examples [1]. By offering a
                                                                  methodological framework for dynamically recommending
                                                                  personalized examples, our study provides a scalable solu-
                                                                  tion to the resource-intensive process of example selection
                                                                  traditionally reliant on domain experts. Our approach ef-
                                                                  fectively connects the extensive collection of programming
                                                                  examples with the unique needs of individual students, im-
                                                                  proving programming education by promoting more effi-
                                                                  cient and personalized learning experiences [6, 20].
Figure 9: Hierarchical clustering summary
                                                                  7. Limitations and Future Work
                                                                  There are a few limitations that need to be addressed in
                                                                  this study. Firstly, SANN was trained on the topics associ-
                                                                  ated with each problem and examples labeled by experts.
                                                                  This current setup limits our ability to use a vast corpus of
worked examples and programming problems that are not              [3] J. T. Mayes, M. R. Kibby, H. Watson, Strathtutor©:
labeled with topics. In the future, we want to eliminate this          The development and evaluation of a learning-by-
limitation by training the SANN model in a topic-agnostic              browsing system on the macintosh, Computers &
way. We propose training the model not on explicit topic               Education 12 (1988) 221–229.
information but instead on the underlying code structure           [4] K. R. Koedinger, A. T. Corbett, C. Perfetti, The
using an encoder-decoder architecture. In this approach, the           knowledge-learning-instruction framework: Bridging
encoder would process the source code to generate a latent             the science-practice chasm to enhance robust student
representation that captures the structural and semantic               learning, Cognitive Science 36 (2012) 757–798.
nuances of the code. The decoder would then reconstruct            [5] L. Carr, W. Hall, S. Bechhofer, C. Goble, Concep-
the code from this latent representation. This unsupervised            tual linking: ontology-based open hypermedia, in:
learning method aims to enable the model to understand                 Proceedings of the 10th International Conference on
and encode the intricate structure of the code more effec-             World Wide Web, 2001, pp. 334–342.
tively, leading to better generalization and more accurate         [6] R. Hosseini, K. Akhuseyinoglu, P. Brusilovsky,
recommendations based on structural similarities rather                L. Malmi, K. Pollari-Malmi, C. Schunn, T. Sirkiä, Im-
than predefined topic labels.                                          proving engagement in program construction exam-
   Additionally, when we explored clustering techniques,               ples for learning python programming, International
we observed that some worked examples are similar, though              Journal of Artificial Intelligence in Education 30 (2020)
they are from different topics. It happens because some top-           299–336.
ics overlap with previously learned topics. For example,           [7] K. Akhuseyinoglu, A. Klašnja-Milićević, P. Brusilovsky,
when dealing with List problems, they might require knowl-             The impact of connecting worked examples and com-
edge of loops. In such cases, in the future, we might consider         pletion problems for introductory programming prac-
sub-categories of these bundles to recommend previous top-             tice, in: European Conference on Technology En-
ics when necessary based on the difficulty progression of              hanced Learning (EC-TEL 2024), Lecture Notes in Com-
a student. For example, if a student struggles with travers-           puter Science, Springer International Publishing, 2024.
ing a list due to difficulties using loops, they would benefit     [8] R. Hosseini, P. Brusilovsky, A study of concept-based
from revisiting similar examples that focus on loops from              similarity approaches for recommending program ex-
previously covered topics.                                             amples, New Review of Hypermedia and Multimedia
   Another future direction of this work is to make the rec-           23 (2017) 161–188.
ommendations more personalized based on student knowl-             [9] M. Hoq, S. R. Chilla, M. Ahmadi Ranjbar,
edge. We want to track students’ learning at various stages            P. Brusilovsky, B. Akram, SANN: Programming code
of the course and incorporate that information in recom-               representation using attention neural network with
mending examples for the current problems they face. The               optimized subtree extraction, in: Proceedings of the
tracing of student learning can also be on a topic level. If a         32nd ACM International Conference on Information
student faces difficulty in a particular topic, this can be im-        and Knowledge Management, 2023, pp. 783–792.
portant information along with the problem code structure         [10] K. Muldner, J. Jennings, V. Chiarelli, A review of
for the recommender system. In addition, struggling with               worked examples in programming activities, ACM
the same topic can also act as an alarm for instructors, indi-         Transactions on Computing Education 23 (2022) 1–35.
cating that a student needs personalized intervention and         [11] D. Tudhope, C. Taylor, Navigation via similarity: au-
support. We also intend to add some baselines from the lit-            tomatic linking based on semantic closeness, Informa-
erature to do a study to show the comparative effectiveness            tion Processing & Management 33 (1997) 233–242.
of our framework in the future.                                   [12] M. Crampes, S. Ranwez, Ontology-supported and
                                                                       ontology-driven conceptual navigation on the world
                                                                       wide web, in: Proceedings of the 11th ACM on Hyper-
8. Conclusion                                                          text and Hypermedia, 2000, pp. 191–199.
                                                                  [13] P. Dolog, N. Henze, W. Nejdl, Logic-based open hy-
In this study, we used the Subtree-based Neural Network
                                                                       permedia for the semantic web, in: Proceedings of
(SANN) model to recommend relevant programming ex-
                                                                       the International Workshop on Hypermedia and the
amples tailored to individual student needs in computer
                                                                       Semantic Web, Hypertext 2003 Conference, 2003.
science (CS) education. Through clustering techniques, in-
                                                                  [14] S. Gross, B. Mokbel, B. Hammer, N. Pinkwart, How to
cluding DBSCAN and hierarchical clustering, we effectively
                                                                       select an example? a comparison of selection strate-
organized the structural and semantic relationships of prob-
                                                                       gies in example-based learning, in: Proceedings of
lems and examples to guide the recommendation of similar
                                                                       the Intelligent Tutoring Systems: 12th International
practices to programming students. Our approach offers a
                                                                       Conference, ITS 2014, Springer, 2014, pp. 340–347.
scalable solution to the resource-intensive process of exam-
                                                                  [15] G. Weber, A. Mollenberg, Elm-pe: A knowledge-based
ple selection, providing contextually appropriate learning
                                                                       programming environment for learning lisp. (1994).
resources tailored to individual student needs.
                                                                  [16] R. Hosseini, P. Brusilovsky, Example-based problem
                                                                       solving support using concept analysis of program-
References                                                             ming content, in: Proceedings of the Intelligent Tutor-
                                                                       ing Systems: 12th International Conference, ITS 2014,
 [1] P. Brusilovsky, C. Peylo, Adaptive and intelligent web-           Springer, 2014, pp. 683–685.
     based educational systems, International Journal of          [17] M. Hoq, Y. Shi, J. Leinonen, D. Babalola, C. Lynch,
     Artificial Intelligence in Education 13 (2003) 156–169.           T. Price, B. Akram, Detecting chatgpt-generated code
 [2] M. Kibby, J. Mayes, Towards intelligent hypertext,                submissions in a cs1 course using machine learning
     Hypertext: Theory into Practice (1989) 164–172.                   models, in: Proceedings of the 55th ACM Technical
                                                                       Symposium on Computer Science Education V. 1, 2024,
     pp. 526–532.
[18] A. J. Sabet, I. Alpizar-Chacon, J. Barria-Pineda,
     P. Brusilovsky, S. Sosnovsky, S. Sosnovsky,
     P. Brusilovsky, A. Lan, et al., Enriching intelli-
     gent textbooks with interactivity: When smart
     content allocation goes wrong, in: Proceedings of the
     4th International Workshop on Intelligent Textbooks,
     volume 3192, 2022.
[19] M. Hoq, J. Vandenberg, B. Mott, J. Lester, N. Norouzi,
     B. Akram, Towards attention-based automatic miscon-
     ception identification in introductory programming
     courses, in: Proceedings of the 55th ACM Technical
     Symposium on Computer Science Education V. 2, 2024,
     pp. 1680–1681.
[20] M. Hoq, P. Brusilovsky, B. Akram, Analysis of an
     explainable student performance prediction model in
     an introductory programming course, in: Proceedings
     of the 16th International Conference on Educational
     Data Mining, 2023, pp. 79–90.