<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Code in a CS1 Course</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Muntasir Hoq</string-name>
          <email>mhoq@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yang Shi</string-name>
          <email>yshi26@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Juho Leinonen</string-name>
          <email>juho.leinonen@auckland.ac.nz</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damilola Babalola</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Collin Lynch</string-name>
          <email>cflynch@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bita Akram</string-name>
          <email>bakram@ncsu.edu</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>North Carolina State University</institution>
          ,
          <country country="US">United States</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>The University of Auckland</institution>
          ,
          <country country="NZ">New Zealand</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2023</year>
      </pub-date>
      <abstract>
        <p>The emergence of ChatGPT has raised concerns about students potentially using it for cheating. Computer Science (CS) educators are becoming worried because of the potential short and long-term adverse efects it might have on students. However, it is unclear to what extent ChatGPT-generated code can be distinguished from student-written code in introductory programming courses. In this work, we analyze how well student-written and ChatGPT-generated code can be automatically distinguished. We use an openly available dataset of student program solutions for CS1 assignments and have ChatGPT generate code for the same assignments. We evaluate the performance of both traditional machine learning models, such as SVM and XGBoost, as well as Abstract Syntax Tree-based deep learning models, such as code2vec and ASTNN, in distinguishing between student and ChatGPT code. The results suggest that both traditional machine learning models and AST-based deep learning models can be very efective in detecting whether a student or ChatGPT wrote the code in an educational setting with accuracies higher ChatGPT, large language model, student programming code analysis, introductory programming course, plagiarism detection, ethical consideration of LLMs AIED2023 Empowering Education with LLMs - the Next-Gen Interface and Content Generation Workshop, June 03-07, htp:/ceur-ws.org CEUR Workshop Proceedings (CEUR-WS.org) ISN1613-073 3https://github.blog/2022-09-07-research-quantifying-github-copilots-impact-on-developer-productivity-and-happiness/</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Plagiarism is a common problem in introductory programming courses [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Previous work has
found, for example, that students might resort to plagiarism due to struggling [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and might
be confused about what constitutes plagiarism in programming [
        <xref ref-type="bibr" rid="ref3 ref4 ref5">3, 4, 5</xref>
        ]. In the context of
programming, plagiarism can take various forms, such as copying code from the internet (e.g.,
      </p>
      <sec id="sec-1-1">
        <title>StackOverflow), sharing solutions between students, and contract cheating.</title>
        <p>Recently, a new possible type of plagiarism has emerged: using powerful, large language
model-based AI models and tools such as ChatGPT1 and GitHub Copilot2 to create solutions
to programming exercises. While these tools might help professional programmers develop
code more eficiently</p>
        <p>
          3 and can be used by instructors to create educational resoruces [
          <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
          ],
programming educators have raised concerns around potential student over-reliance on these
CEUR
Workshop
Proceedings
models [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Students using such models without attributing the created code to the model might
be considered a new type of plagiarism. Recent work has found that up to 80% of introductory
programming problems can be successfully solved by state-of-the-art AI models [9, 10] and that
this performance is better than the performance of average students [10]. Similar performance
has been observed for more complex data structures and algorithms-level exercises [11].
        </p>
        <p>
          One challenge related to plagiarism or student over-reliance on AI models is that it might be
dificult to detect. Programming plagiarism detection tools that have been traditionally used
in introductory programming courses such as MOSS4 and JPlag [12] are based on comparing
student submissions to each other. However, recent AI models do not work deterministically,
and thus traditional methods for detecting plagiarism in introductory programming might be
infeasible for detecting AI-created code. Some recent eforts aim to detect AI-generated content
using AI models, expecting to mitigate the uncertainties of advanced AI. However, they only
achieve mediocre performance that is far from adaptable in educational use [13] when applied
to essays, and they are not specific to programming education. While there are methods that
focus on the process of writing code [
          <xref ref-type="bibr" rid="ref2">2, 14, 15</xref>
          ], these require tailored IDEs and have thus not
been widely adopted.
        </p>
        <p>In this work, we study the automatic detection of ChatGPT-created programs for introductory
programming exercises. We use a publicly available dataset of student-written programs in a
CS1 course5 and use ChatGPT to create programs for the same exercises. We evaluate diferent
classification methods for identifying code as AI-generated or student-written. Our research
question is: “How well can ChatGPT-created programs be distinguished from student-created
programs in CS1 courses?”</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>Cheating and plagiarism are more common issues in introductory-level courses than in
higherlevel courses, i.e., ones attended by graduate students [16]. Studies have shown that cheating is
common among struggling students, aided with wide tolerance despite awareness of university
policy [16, 17]. Moreover, the rise of online courses made the scenario even worse [18, 19].</p>
      <p>As artificial intelligence and natural language processing technologies have advanced,
distinguishing between human and machine-generated text has become increasingly important.
Several academic studies have proposed several methods to detect machine-generated text.
In a previous study [20], a method for detecting computer-generated text (CGT) in academic
papers was proposed. The method involves extracting topic-related text features, stemming
tokens to avoid confusion, and scoring features. These features were used to train a nearest
neighbor classifier binary classification model due to its simplicity on small datasets. The
methodology was evaluated using various datasets and demonstrated high accuracy in detecting
CGT in diferent domains, such as online reviews, social media posts, and scientific articles.
For detecting programming plagiarism, Huang et al. [21] introduced a novel code plagiarism
detection approach using the XGBoost incremental learning algorithm. The method involves
extracting 11 code features through static analysis, filtering out weaker features, and training an</p>
      <sec id="sec-2-1">
        <title>4https://theory.stanford.edu/~aiken/moss/ 5https://codeworkout.cs.vt.edu/</title>
        <p>XGBoost model to identify plagiarism. The approach demonstrated high accuracy in detecting
code plagiarism, surpassing existing techniques. It is proposed as an efective tool for plagiarism
detection in academic and software industry scenarios. Kechao et al. [ 22] proposed a student
program plagiarism detection approach using the CloSpan data mining algorithm. It mines
comparable code segments, computes similarities between programs, and generates a plagiarism
report. Experimental results showed improved precision and detection eficiency compared
to the MOSS tool, providing more detailed statistical information and visualizing comparable
code fragments. However, with the rise of the ChatGPT model, these methods need to be
at least validated, as the similarity-based methods may not apply in cases when the code is
automatically generated as compared with copied from other students or resources.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Method</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset</title>
        <p>We use the student-written code from a publicly available dataset obtained from the
CodeWorkout6 platform. The CodeWorkout dataset contains student code from an introductory
programming course in Java. There are 50 programming problems. The programs are graded
with the ratio of test case passing. We use the first 10 problems from the Spring 2019 semester
in our experiment. Uncompilable submissions are removed from the dataset as uncompilable
code can not be parsed into ASTs. Incorrect submissions are also removed from the dataset
since the intermediate states of a problem-solving process. Moreover, ChatGPT correctly solves
programming problems from introductory programming courses, and we do not want our
detection process biased towards detecting correct and incorrect codes. The programming
problems include diferent introductory Java programming concepts, such as methods, variable
declaration, data types, conditionals, strings, etc. Students in this course were given a prompt
with the problem statement for each assignment with a function prototype.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. ChatGPT-generated Code</title>
        <p>A dataset comprising programming code generated by ChatGPT is created for the purpose of
this study. To create this dataset, we present ChatGPT with the problem statements of the first
ten problems. We used the GUI to interact with ChatGPT (March 2023). We also provide the
ChatGPT prompt with the function prototype given to the students with the problem statement
in the CodeWorkout platform to ensure that ChatGPT has the same information as students
had when they were constructing their programs. In order to maintain uniformity and balance
in the dataset, we collect 300 ChatGPT-generated codes for each assignment, which are used
for comparison with the student-written codes. The correctness of ChatGPT-generated code
is manually examined to ensure that we only include the correct programs. To generate each
instance of a solution to a specific problem, we regenerate the response of ChatGPT to get
diferent solutions. The characteristics of both datasets are provided in Table 1.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Automatically Distinguishing ChatGPT and student code</title>
        <p>To detect student-written codes and ChatGPT-generated code, we use both diferent traditional
machine learning techniques and more recent neural methods. The traditional methods include
Support Vector Machine (SVM) [23, 24, 25] and Extreme Gradient Boosting (XGBoost) [21, 24, 25].
The more recent methods include code2vec [26], and ASTNN [27].</p>
        <p>code2vec [26] is an attention-based neural network model that uses a code representation
technique to learn vector embeddings for programming code tokens, such as function and
variable names and their surrounding code context. The model was first introduced for predicting
method names, given their implementation for analyzing professionally-written Java codes, and
is also known for representing programming code information in educational domains such as
bug detection [28, 29], performance prediction [30, 31] and skill representation [32, 33].</p>
        <p>The code2vec model comprises two encoders and one attention mechanism: a path-based
encoder and an attention-based classifier. The path-based encoder takes a code snippet
(represented as an abstract syntax tree, AST) as input and represents it as a set of paths connecting
two leaf nodes. These paths are then encoded into a fixed-length vector with the attention
mechanism by calculating the importance of diferent paths. It allows for exploring the most
important paths and structures of the programming code, which provides insights into
understanding the associated task. Finally, the model learns to predict the label of the code snippet
from the code vectors. In this study, we use code2vec to classify programming codes into
student-written codes and ChatGPT-generated codes based on their learned vector embeddings.</p>
        <p>ASTNN [27] is an Abstract Syntax Tree-based Neural Network that uses an AST-based
neural network approach to code classification. The model takes the Abstract Syntax Tree
(AST) of a code snippet as input and generates a vector representation of the code’s structure.
It has demonstrated strong performance in various code classification tasks, including code
correctness prediction, detecting code patterns, identifying plagiarism, and detecting code
clones [27, 34, 35, 36, 37, 38]. The basic idea is to split an AST into diferent statement trees
and generate vectorial code representations. These statement trees are then embedded into
statement tree vectors, and a bidirectional recurrent neural network is used to encode the
statements with their sequential dependency. Further feature extraction layers then process
the embeddings before the final identification on student-written or ChatGPT-generated code.
ASTNN is designed to capture the high-level structure of the code and is, therefore, well-suited
for classifying code snippets. We also developed an ASTNN model for the AI code identification
task in our experiments for comparisons.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments</title>
      <p>Experiments are designed to distinguish between student-written codes and ChatGPT-generated
codes. To detect the author of a code, we perform a binary classification task, where we identify
if a piece of code is being written by a student (class 1) or generated using ChatGPT (class 0).
We use accuracy, precision, recall, and F1-score as the evaluation metrics. Utilizing a variety of
evaluation metrics enables a complete understanding of model strengths and weaknesses [39].</p>
      <p>We also use diferent traditional Machine Learning models, including SVM and XGBoost, as
the baselines to compare with the AST-based deep learning models. For these models, we use
TFIDF [31] to embed and vectorize the programming codes for the inputs of these traditional
ML models. We use 10-fold cross-validation to tune the hyperparameters of the traditional ML
models. For SVM, the kernel is set to ‘poly’ from the set {‘linear’, ‘poly’, ‘rbf’}, C to 10 from
the set {0.1, 1, 10}. For XGBoost, we set the value of max_depth to 10 from the set of {3, 6, 10},
gamma to 1 from the set {1, 5, 9}, and n_estimator to 180.</p>
      <p>We performed a manual search to tune the hyperparameters of the code2vec and ASTNN
models. The dataset is split in a 3:1:1 ratio for training, validating, and testing in the experiments.
The best hyperparameters are selected using the validating dataset, while the results are reported
on the testing dataset. For the code2vec model, we set the embedding size of the model to 128
from a set of {64, 128, 256} and padded the context path sequence for each code submission to
100. For the ASTNN model, we set the embedding size to 128 from a set of {64, 128, 256}. The
maximum epoch is set to 200 with a patience of 50 to prevent overfitting. To generate the ASTs
from the programming codes, an open-source tool called javalang7 is used. javalang provides a
lexer and a parser for Java programming language.</p>
    </sec>
    <sec id="sec-5">
      <title>5. Results</title>
      <p>To investigate how well student-written and ChatGPT-generated codes can be distinguished in
an introductory programming course, we perform a classification task using SVM, XGBoost,
code2vec, and ASTNN model. In the experiments, we randomly selected 4000 programs in
the training set and 1083 programs in the validation and test set. The testing results of the
experiment are shown in Table 2.</p>
      <p>From Table 2, one can see that AI models can distinguish between student-written codes
and ChatGPT-generated codes. All identification accuracies and F1 scores are higher than 90%,
which means that distinguishing is accurate with any of the models analyzed in the experiments.
SVM has the lowest performance compared to other models, and XGBoost outperforms SVM.
Table 2 also shows that code2vec outperforms ASTNN in terms of accuracy, recall, and f1-score
with the value of 0.95, 0.95, and 0.95, respectively. However, the precision of ASTNN is the
highest, with a value of 0.99. This implies that the correctly predicted positive instances by
ASTNN over all the predicted positive instances is 99%. However, from the recall value, we can
see that ASTNN has a lower recall value of 0.87. Therefore, it predicts fewer positive instances
over all the actual positive instances than code2vec (recall value: 0.95).</p>
      <p>From these experimental results, we can conclude that code2vec performs better than ASTNN
in classifying student-written codes and ChatGPT-generated codes. In general, the AST-based
models have better performance than the traditional ML models in terms of accuracy, precision,
recall, and F1-score. However, the performance of the traditional ML models is competitive
compared to the AST-based models though traditional ML models deal with codes as textual
data, whereas AST-based models try to encode the syntactic and semantic information of the
codes. This indicates that there are enough textual diferences between student-written codes
and ChatGPT-generated codes. When looking into the diferences between student-written code
and ChatGPT code (see e.g. Figure 1), we found that ChatGPT uses more advanced operations
like the ternary operator while students typically use nested if-else statements instead. This
was supported by looking further into the attention weights of the code2vec model for the
caughtSpeeding problem (Fig 1) and observing that the most important features to detect student
versus ChatGPT code relied on paths related to using ternary versus nested if-else.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion</title>
      <p>The emergence of advanced AI tools (such as ChatGPT) could cause adverse efects in
introductory programming such as students using AI tools to solve homework without understanding
the generated code. However, in this paper, we found evidence to assure computing education
practitioners that ChatGPT-generated code can be detected in introductory programming. We
found that ChatGPT-generated code submissions could be detected automatically with a
relatively high performance (95% accuracy). The ChatGPT code submissions also share patterns that
human instructors could identify such as using more advanced programming constructs than
the typical student would use. While it could still be worthwhile to investigate how students
interact with such tools, they are efortless to detect by 1) an automated tool (as our experiment
suggests) and 2) instructor inspections.</p>
      <p>In addition, we also went a step further on preliminary explorations of the code generated
by diferent prompts. Through simple attempts, we found that even if students attempt to
make simple modifications to the prompts for ChatGPT to generate code that mimics novice
programming code, the code generated is still distinguishable from students’ own written code.
For example, when we tried to add the sentence “Write the code as a novice programmer”, or
“Act as a novice programmer while programming”, etc., the code generated is still structurally
diferent from actual students’ programming code (e.g., the ChatGPT-generated code will still
use ternary operators). While more solid experiments are still yet to be designed to validate
this finding systematically, this preliminary result suggests that our findings on detecting the
AI-generated code remain promising across diferent scenarios.</p>
      <p>There are many potential next steps for instructors after detecting students’ cheating. As
a data-driven model detector, we acknowledge that there is no evidence to be presented to
students showing that they committed cheating. However, many mechanisms could be designed
to prevent cheating. For example, a cheating alarm system could be created to prevent students
from submitting possibly AI-generated source code. When the submitted file is likely to have
been generated by AI, instructors could request a re-submission automatically or ask the students
for an additional check. Since possibly cheating students may face challenges in learning, the
system could also serve as a detector of learning dificulty, and students who trigger the alarms
frequently could be targeted by possible support interventions to learn related concepts.</p>
      <p>Even though we consider using ChatGPT as potential cheating in this work, a suggestion we
make is that schools and teachers should consider teaching students how to use AI ethically
and eficiently. AI-based tools are double-edged swords, which can be over-relied on but might
also help students learn concepts [40, 41]. It remains an efort and active area in the computing
education domain to teach students when and how [42] to seek help from AI-driven tools.</p>
      <sec id="sec-6-1">
        <title>6.1. Limitations</title>
        <p>While this study provides an extensive exploration into detecting code generated by ChatGPT,
some limitations should be considered. One of the main limitations of this study is that the
dataset used for ChatGPT-generated code has a small number of variations on a small set of
original problems (10 problems). The prompts from the introductory programming course
considered were straightforward and short, limiting the variety of code structures and syntax
that ChatGPT would produce. It could cause the detecting model to be overfitted to specific
kinds of variations, and future work should study other types of problems to verify whether
our findings can be generalized, e.g., to more complicated problems. In addition, our current
research makes an assumption that novice students know how to use ChatGPT to generate
code. While there is little existing research (see e.g. [43, 44] for some preliminary results)
systematically investigating how students interact with large language models and tools based
on them such as ChatGPT, chances are that many students have not heard about it or do not
have the ability to work with such tools.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>7. Conclusion</title>
      <p>We investigated the possibility of diferentiating between student-written code and code
generated by ChatGPT, a large language model developed by OpenAI, in the context of an
introductory programming course. This study uses the CodeWorkout dataset with additional
ChatGPT-generated data for the identification task. The study uses AST-based deep learning
models, such as code2vec and ASTNN, and compares them with traditional machine learning,
such as SVM and XGBoost, on their performance. The results show that all four models achieved
high levels of accuracy of 90% and above, with code2vec achieving the best accuracy of 95%.
[9] P. Denny, V. Kumar, N. Giacaman, Conversing with copilot: Exploring prompt engineering
for solving cs1 problems using natural language, in: Proceedings of the 54th ACM Technical
Symposium on Computer Science Education V. 1, 2023, pp. 1136–1142.
[10] J. Finnie-Ansley, P. Denny, B. A. Becker, A. Luxton-Reilly, J. Prather, The robots are coming:
Exploring the implications of openai codex on introductory programming, in: Australasian
Computing Education Conference, 2022, pp. 10–19.
[11] J. Finnie-Ansley, P. Denny, A. Luxton-Reilly, E. A. Santos, J. Prather, B. A. Becker, My ai
wants to know if this will be on the exam: Testing openai’s codex on cs2 programming
exercises, in: Proc. of the 25th Australasian Computing Education Conf., 2023, pp. 97–104.
[12] L. Prechelt, G. Malpohl, M. Philippsen, et al., Finding plagiarisms among a set of programs
with jplag., J. Univers. Comput. Sci. 8 (2002) 1016.
[13] G. Rosalsky, E. Peaslee, This 22-year-old is trying to save us from chatgpt before it changes
writing forever, NPR. Archived from the original on January 18 (2023).
[14] J. Leinonen, K. Longi, A. Klami, A. Ahadi, A. Vihavainen, Typing patterns and
authentication in practical programming exams, in: Proceedings of the 2016 ACM Conference on
Innovation and Technology in Computer Science Education, 2016, pp. 160–165.
[15] K. Longi, J. Leinonen, H. Nygren, J. Salmi, A. Klami, A. Vihavainen, Identification of
programmers from typing patterns, in: Proceedings of the 15th Koli Calling Conference
on Computing Education Research, 2015, pp. 60–67.
[16] J. Sheard, S. Markham, M. Dick, Investigating diferences in cheating behaviours of it
undergraduate and graduate students: The maturity and motivation factors, Higher
Education Research &amp; Development 22 (2003) 91–108.
[17] J. Sheard, M. Dick, S. Markham, I. Macdonald, M. Walsh, Cheating and plagiarism:
Perceptions and practices of first year it students, in: Proceedings of the 7th Annual Conference
on Innovation and Technology in Computer Science Education, 2002, pp. 183–187.
[18] S. Dendir, R. S. Maxwell, Cheating in online courses: Evidence from online proctoring,</p>
      <p>Computers in Human Behavior Reports 2 (2020) 100033.
[19] M. Jha, S. J. Leemans, R. Berretta, A. A. Bilgin, L. Jayarathna, J. Sheard, Online
assessment and covid: Opportunities and challenges, in: Australasian Computing Education
Conference, 2022, pp. 27–35.
[20] A. Lavoie, M. Krishnamoorthy, Algorithmic detection of computer generated text, arXiv
preprint arXiv:1008.0706 (2010).
[21] Q. Huang, G. Fang, K. Jiang, An Approach of Suspected Code Plagiarism Detection Based
on XGBoost Incremental Learning, Proceedings of the 2019 International Conference on
Computer, Network, Communication and Information Systems (CNCI 2019) (2019).
[22] W. Kechao, W. Tiantian, Z. Mingkui, W. Zhifei, R. Xiangmin, Detection of plagiarism in
students’ programs using a data mining algorithm, in: Proceedings of 2012 2nd International
Conference on Computer Science and Network Technology, 2012, pp. 1318–1321.
[23] A. Eppa, A. Murali, Source code plagiarism detection: A machine intelligence approach,
in: 2022 IEEE Fourth International Conference on Advances in Electronics, Computers
and Communications (ICAECC), 2022, pp. 1–7. doi:1 0 . 1 1 0 9 / I C A E C C 5 4 0 4 5 . 2 0 2 2 . 9 7 1 6 6 7 1 .
[24] M. Hoq, P. Brusilovsky, B. Akram, Analysis of an explainable student performance
prediction model in an introductory programming course, in: Proceedings of the 16th
International Conference on Educational Data Mining (EDM) 2023, 2023.
[25] M. Hoq, M. N. Uddin, S.-B. Park, Vocal feature extraction-based artificial intelligent model
for parkinson’s disease detection, Diagnostics 11 (2021) 1076.
[26] U. Alon, M. Zilberstein, O. Levy, E. Yahav, code2vec: Learning distributed representations
of code, Proceedings of the ACM on Programming Languages 3 (2019) 1–29.
[27] J. Zhang, X. Wang, H. Zhang, H. Sun, K. Wang, X. Liu, A novel neural source code
representation based on abstract syntax tree, in: 2019 IEEE/ACM 41st International
Conference on Software Engineering (ICSE), IEEE, 2019, pp. 783–794.
[28] Y. Shi, K. Shah, W. Wang, S. Marwan, P. Penmetsa, T. Price, Toward semi-automatic
misconception discovery using code embeddings, in: LAK21: 11th International Learning
Analytics and Knowledge Conference, 2021, pp. 606–612.
[29] Y. Shi, Y. Mao, T. Barnes, M. Chi, T. W. Price, More with less: Exploring how to use
deep learning efectively through semi-supervised learning for automatic bug detection in
student code., in: EDM’21, 2021, pp. 446–453.
[30] Y. Shi, M. Chi, T. Barnes, T. Price, Code-dkt: A code-based knowledge tracing model for
programming tasks, in: In Proc. of the 15th Int. Conf. on Educational Data Mining, 2022,
pp. 50–61.
[31] M. Hoq, P. Brusilovsky, B. Akram, SANN: A subtree-based attention neural network model
for student success prediction through source code analysis, in: 6th Educational Data
Mining in Computer Science Education (CSEDM) Workshop, 2022.
[32] Y. Shi, Interpretable code-informed learning analytics for cs education, in: Companion</p>
      <p>Proc. 13th Int. Conf. on Learning Analytics &amp; Knowledge, 2023, pp. 180–187.
[33] Y. Shi, R. Schmucker, M. Chi, T. Barnes, T. Price, Kc-finder: Automated knowledge
component discovery for programming problems., in: Proceedings of the 16th International
Conference on Educational Data Mining (EDM) 2023, 2023.
[34] W. Wang, G. Li, B. Ma, X. Xia, Z. Jin, Detecting code clones with graph neural network
and flow-augmented abstract syntax tree, in: 2020 IEEE 27th International Conference on
Software Analysis, Evolution and Reengineering (SANER), IEEE, 2020, pp. 261–271.
[35] C. Fang, Z. Liu, Y. Shi, J. Huang, Q. Shi, Functional code clone detection with syntax
and semantics fusion learning, in: Proceedings of the 29th ACM SIGSOFT International
Symposium on Software Testing and Analysis, 2020, pp. 516–527.
[36] Y. Mao, Y. Shi, S. Marwan, T. W. Price, T. Barnes, M. Chi, Knowing both when and where:
Temporal-astnn for early prediction of student success in novice programming tasks, in:
In Proceedings of the 14th International Conference on Educational Data Mining (EDM)
2021, 2021.
[37] M. Lei, H. Li, J. Li, N. Aundhkar, D.-K. Kim, Deep learning application on code clone
detection: A review of current knowledge, Journal of Systems and Software 184 (2022)
111141.
[38] M. A. Fokam, R. Ajoodha, Influence of contrastive learning on source code plagiarism
detection through recursive neural networks, in: 2021 3rd International Multidisciplinary
Information Technology and Engineering Conference (IMITEC), IEEE, 2021, pp. 1–6.
[39] S. Sarsa, J. Leinonen, A. Hellas, et al., Empirical evaluation of deep learning models for
knowledge tracing: Of hyperparameters and metrics on performance and replicability,
Journal of Educational Data Mining 14 (2022).
[40] K. Holstein, B. M. McLaren, V. Aleven, Student learning benefits of a mixed-reality teacher
awareness tool in ai-enhanced classrooms, in: Artificial Intelligence in Education: 19th
International Conference, AIED 2018, London, UK, June 27–30, 2018, Proceedings, Part I
19, Springer, 2018, pp. 154–168.
[41] B. Akram, W. Min, E. Wiebe, B. Mott, K. E. Boyer, J. Lester, Improving stealth
assessment in game-based learning with lstm-based analytics, in: International Conference on
Educational Data Mining, 2018, pp. 208–218.
[42] S. Marwan, J. Jay Williams, T. Price, An evaluation of the impact of automated programming
hints on performance and learning, in: Proceedings of the 2019 ACM Conference on
International Computing Education Research, 2019, pp. 61–70.
[43] J. Prather, B. N. Reeves, P. Denny, B. A. Becker, J. Leinonen, A. Luxton-Reilly, G. Powell,
J. Finnie-Ansley, E. A. Santos, ”it’s weird that it knows what i want”: Usability and
interactions with copilot for novice programmers, arXiv preprint arXiv:2304.02491 (2023).
[44] M. Kazemitabaar, J. Chow, C. K. T. Ma, B. J. Ericson, D. Weintrop, T. Grossman, Studying the
efect of ai code generators on supporting novice learners in introductory programming,
in: Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems,
2023, pp. 1–23.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>I. Albluwi</surname>
          </string-name>
          ,
          <article-title>Plagiarism in programming assessments: a systematic review</article-title>
          ,
          <source>ACM Transactions on Computing Education (TOCE) 20</source>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>28</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hellas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leinonen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ihantola</surname>
          </string-name>
          ,
          <article-title>Plagiarism in take-home exams: help-seeking, collaboration, and systematic cheating</article-title>
          ,
          <source>in: Proceedings of the 2017 ACM Conference on Innovation and Technology in Computer Science Education</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>238</fpage>
          -
          <lpage>243</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cosma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Joy</surname>
          </string-name>
          ,
          <article-title>Towards a definition of source-code plagiarism</article-title>
          ,
          <source>IEEE Transactions on Education</source>
          <volume>51</volume>
          (
          <year>2008</year>
          )
          <fpage>195</fpage>
          -
          <lpage>200</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Joy</surname>
          </string-name>
          , G. Cosma, J. Y.
          <string-name>
            <surname>-K. Yau</surname>
            ,
            <given-names>J. Sinclair,</given-names>
          </string-name>
          <article-title>Source code plagiarism-a student perspective</article-title>
          ,
          <source>IEEE Transactions on Education</source>
          <volume>54</volume>
          (
          <year>2010</year>
          )
          <fpage>125</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Joy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sinclair</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Boyatt</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-K. Yau</surname>
          </string-name>
          , G. Cosma,
          <article-title>Student perspectives on source-code plagiarism</article-title>
          ,
          <source>International Journal for Educational Integrity</source>
          <volume>9</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hellas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leinonen</surname>
          </string-name>
          ,
          <article-title>Automatic generation of programming exercises and code explanations using large language models</article-title>
          ,
          <source>in: Proceedings of the 2022 ACM Conference on International Computing Education Research-Volume</source>
          <volume>1</volume>
          ,
          <year>2022</year>
          , pp.
          <fpage>27</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>P.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarsa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hellas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Leinonen</surname>
          </string-name>
          ,
          <article-title>Robosourcing educational resources-leveraging large language models for learnersourcing</article-title>
          ,
          <source>arXiv preprint arXiv:2211.04715</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Becker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Denny</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Finnie-Ansley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Luxton-Reilly</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Prather</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Santos</surname>
          </string-name>
          ,
          <article-title>Programming is hard-or at least it used to be: Educational opportunities and challenges of ai code generation</article-title>
          ,
          <source>in: Proceedings of the 54th ACM Technical Symposium on Computer Science Education V. 1</source>
          ,
          <issue>2023</issue>
          , pp.
          <fpage>500</fpage>
          -
          <lpage>506</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>