<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Journal of Educational
Computing Research 19 (1998) 367-381. doi:10.2190/C670</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <article-id pub-id-type="doi">10.3390/a17020062</article-id>
      <title-group>
        <article-title>Prompt Engineering for LLMs Educational Alignment</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giulio Barbero</string-name>
          <email>g.barbero@liacs.leidenuniv.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mike Preuss</string-name>
          <email>m.preuss@liacs.leidenuniv.nl</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Leiden University (LIACS)</institution>
          ,
          <addr-line>Einsteinweg 55, 2333CC, Leiden</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>1</fpage>
      <lpage>6</lpage>
      <abstract>
        <p>Since the release of GPT in 2022, generative artificial intelligence (AI) has quickly become ubiquitous. In educational institutions, the impact of generative AI has almost monopolised internal debate. Most of us have probably been involved in discussions about how to tackle the growing reliance of students on large language models (LLM). Research on the impact of this technology in education is catching up. In the case of computer science education, experimental projects in the field report contradictory results. This paper is composed of two parts: it first explores existing literature and compares studies to shed light on the impact of generative AI specifically for programming education. In doing so, we highlight the risks arising from uncontrolled generative AI. However, we also argue in favour of controlled generative AI as a tutoring tool. This leads us to the second part; we involve experts in a preliminary experiment to investigate how prompting can be used to align AI models with specific educational styles. Initial results suggest that detailed role-playing instructions, built upon existing pedagogical research, lead to more efective feedback.</p>
      </abstract>
      <kwd-group>
        <kwd>tutoring</kwd>
        <kwd>generative AI</kwd>
        <kwd>large language models</kwd>
        <kwd>programming education</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Generative AI has rapidly become widespread across various activities. As is often the case, the
advancement of disruptive technologies fosters new discussions within old contexts; think about the legal
considerations around AI-generated content [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or the studies about potential applications of generative
AI in education [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Concerning the latter, many studies highlight the risks of using generative AI
in computer science education. For example, with the assistance of GPT models, students are able
to complete assignments more quickly, but they also retain less information compared to their peers
who worked without AI help [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]. On the other hand, other studies in similar settings revealed that
students’ computational thinking skills improved using generative AI [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>Therefore, to have a full picture of the impact of generative AI on programming education, the
discrepancies among studies should be analysed. In the first part of the present paper, we perform a
literature review to investigate current experimental studies in the field. From this, we isolate how
the characteristics and goals of the experimental context influence its efectiveness. As a result of this
analysis, we argue that:
1. Unrestricted use of LLMs in class positively afects students’ confidence but negatively impacts
knowledge retention.</p>
      <p>
        2. LLMs have potential as tutoring tools as long as prompting is somehow restricted.
In the second part, we explore the possible applications of generative AI as a tutoring tool by developing
and testing prompts based on the quality of the feedback provided. We build upon existing research,
which showed how careful prompt engineering can be used to misalign LLMs with so-called ”persona
attack” [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. However, we aim to use prompt engineering to align an LLM’s responses with education
contexts. We specifically want to test the impact of role-playing information, a common technique used
in persona attacks. Ultimately, the goal of the paper is to discuss the following research questions:
      </p>
      <sec id="sec-1-1">
        <title>1) What is the efect of generative AI on programming education?</title>
        <p>CEUR
Workshop</p>
        <p>ISSN1613-0073
• How is the impact of current experimentations measured?
• What are the results of these interventions?
• How can we reconcile apparently opposite results in the field?</p>
      </sec>
      <sec id="sec-1-2">
        <title>2) How does the prompt influence the quality of the feedback provided?</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Literature Review</title>
      <p>
        Our understanding of generative AI’s impact on programming education remains limited. A recent
literature review about empirical research in the field includes only thirty-seven studies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Of these,
only two evaluate students’ computational thinking and programming skills. In fact, the majority of the
studies are limited to how LLMs perform in terms of programming. These include skills such as
debugging [8], pair programming [9] or program generation [10]. However, these skills do not necessarily
reflect the impact of generative AI on learning; they depict it as a useful tool for programmers. This
seems to be a common misconception in current research in the field, where generative AI performance
in teaching environments is evaluated based on its programming performance [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Going back to the
aforementioned literature review [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], the two studies focusing on students’ computational thinking
and programming skills development present very diferent methodologies. The first one highlights
the positive efects on both metrics for students using generative AI. However, the measurement of
computational thinking is quite complex and heavily dependent on the model used. In the case of this
experiment, computational thinking skills are measured using a scale based on a more abstract model
[11]. This includes skills that are not necessarily exclusive or focused on programming: creativity,
algorithmic thinking, problem solving, critical thinking, cooperativity and communication skills. However,
the most critical flaw of the study is the testing methodology for programming skills development,
measured using a self-eficacy scale focused on students’ confidence in tackling abstract programming
problems [12] (see figure 1). We argue that, with these tools, conclusively evaluating the actual impact
of generative AI on students’ programming learning is impossible.
      </p>
      <p>
        The second study has a completely diferent approach. In this case, generative AI has been
implemented in a gamified interface [ 13]. The article presents the experiment as ongoing, and therefore,
actual data about students’ performance and improvement is not available and necessitates further
studies. The author only reports positive efects on motivation and acceptance that are typical of a
gamified environment [ 14]. However, the interesting element in the study is the innate limitations
of generative AI that a gamified environment can present. In general, both studies are reported as
dealing with computational thinking and programming skills development, but we argue that their
ifnal measurements are not necessarily focused on these topics. Studies measuring actual student
performance have only recently emerged. For example, a recent experiment in a programming class
of Fortran indicates that retention is superior for students who do not use generative AI [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. In the
experimental setup, the experimental groups are allowed a quite free use of various modern generative
models. The control group is allowed to use Google exclusively.
      </p>
      <p>However, other studies take a diferent perspective, evaluating generative AI in its ability to tutor
students. Current research highlights the potential for LLMs to perform almost as well as human tutors
[15]. Obviously, evaluation of tutoring is quite complex, and it often relies on experts. Moreover, in order
to make results comparable, many studies limit what students can ask, for example, using preselected
prompts [15]. In general, most studies report positive results on students’ acceptance and motivation
using generative AI [16]. However, others highlight a negative correlation between perceived ease of
use and perceived usefulness [17].</p>
      <sec id="sec-2-1">
        <title>2.1. Discussion</title>
        <p>
          In this chapter, we take a moment to answer the sub-questions of the first research question. We deem
this step important in order to illustrate the relevance of the subsequent experiment as an exploration
of the second research question.
2.1.1. How is the impact of current experimentations measured?
Current research primarily measures the acceptance of generative AI among students and teachers.
This is often performed through the typical technology acceptance model, a well-studied and validated
tool for measurement. On the other hand, other aspects of the impact of generative AI on programming
education are more dificult to measure. We argue that metrics and research goals are not always
well aligned. In the case of computational thinking skills development, diferent models can be used
as references. However, each model has a diferent perspective on these skills and selecting the
best-suited one is a necessary evaluation. As for programming skills, teachers already have many
tools to test students’ learning. Self-assessment tools have their own reason and space, however,
they tend to be more strongly related to the respondent’s confidence than their actual development.
Confidence is particularly reinforced with the ability to perform a certain job, something that, especially
in introductory programming curricula, generative AI can certainly provide. However, we argue that
this is not necessarily related to students’ proficiency in programming but with their perceived capacity
to pass the assignments provided. Other studies strongly focus on human comparison. In these cases,
measurements are usually performed by humans who evaluate generative AI’s performance compared
to other human experts. It is the case of experiments centred around the efect of AI-powered tutoring.
Another specific characteristic of this format is that it often relies, by necessity, on predetermined
prompts or other forms of limitations that make human and AI tutoring comparable.
2.1.2. What are the results of these interventions?
Results of empirical research in the field show a definite improvement in students’ motivation. This
is definitely a relevant efect, probably related to the ease of use of generative AI and the enthusiasm
emerging from a new, and quite frankly impressive, technology. Results emerging from technology
acceptance studies (using variations of the technology acceptance model [18]) are also extremely
encouraging, especially for younger participants. As mentioned above, many studies also highlight a
positive efect on students’ confidence and self-assessment. However, these efects do not automatically
translate into students’ final retention and learning. In this regard, the ability to have at one’s disposal
immediate solutions may hinder deep learning, as students bypass the work required to internalise
concepts. In fact, cognitive load theory suggests that excessive assistance reduces mental efort,
preventing students from actively engaging in knowledge construction, no matter if the assistance is
by humans or AI. In studies about AI tutoring, human tutors evaluate generated answers positively,
almost at the level of human tutoring. However, the performance of AI tutoring without constraints
compared to its human counterpart is still unknown.
2.1.3. How can we reconcile apparently opposite results in the field?
The diversity of results in the field can be justified by two main elements:
• The field is still in its infancy; as shown in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], a related literature review on empirical studies
only reports thirty-seven studies. It is natural that this leads to a great variation of results as the
novelty gives great space for exploration.
• Mainly, the results are not necessarily conflicting; efects on motivation and self-eficacy reports
do not necessarily translate to performance or retention. Students can feel more motivated and
more confident in engaging with their tasks, but at the same time, not fully absorb the necessary
information.
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experiment: Prompt Engineering for Tutoring</title>
      <p>In this section, we investigate whether prompt engineering can be used to control generative AI and
align it with a tutoring role. While we can look at this specific problem from very diferent perspectives,
we begin by taking a naive approach to it, arguing that giving the model research-grounded directions
will lead to better feedback. In this regard, we make use of two main tutoring directions. First, we want
the model to follow a bottom-up approach, providing problem-specific feedback first and, if appropriate,
only subsequently introducing more general concepts [19]. By doing so, we truly focus on the role
of generative AI as a tutor in the context of assignment completion. Secondly, we want our model to
facilitate students’ problem-solving instead of replacing it completely. Therefore, our second tutoring
direction is to use a ”moderate” approach, preserving students’ autonomy and agency [20]. Finally, in
order to make the experiment as realistic as possible, we use two actual programming assignments used
in introductory courses of our faculty. Therefore, we design three separate prompt conditions:
• No instructions prompt (”Can you help with this assignment?” + assignment)
• Simple instructions prompt (”You are a programming tutor. Can you help with this assignment?”
+ assignment)
• Detailed instructions prompt (”You are a programming tutor. You give students moderate
directions but let them retain autonomy in completing their homework. You focus on the specific
problem at hand before providing insight into general concepts related to it. Can you help with
this assignment?” + assignment)</p>
      <sec id="sec-3-1">
        <title>3.1. Methodology</title>
        <p>First, we need to recreate a reasonable student query. We take into consideration two assignments:
• merging two distinct files containing lists of email addresses into one final list. Some email
addresses are present in both lists, the final result should avoid repeating the same address
twice. Moreover, we give specific instructions about the order in which these two lists should be
processed.
• checking a string containing a DNA sequence. The students need to check that the string contains
only valid characters (”A”, ”C”, ”T” or ”G”). If these are the only letters, the code should print
”Valid”. On the other hand, if an invalid character is found, the code should stop and print ”Invalid”
followed by the valid letters so far.</p>
        <p>In order to simulate a typical interaction, we also provide the LLM with code that includes common
student mistakes for each assignment. Therefore, the text for the email lists merging assignment is the
following:
# A s s i g n m e n t 4 − Mail merge , Unique e m a i l s , DNA v a l i d p a r t
# I n t h e a s s i g n m e n t 4 l i n k on B r i g h t s p a c e you a r e p r o v i d e d w i t h two
t x t f i l e s , e a c h c o n t a i n i n g a l i s t o f e m a i l s . Some e m a i l s a r e
r e c u r r i n g i n b o t h d o c u m e n t s .
#A . Your f i r s t g o a l i n t h i s a s s i g n m e n t i s t o merge t h e two l i s t s o f
e m a i l s i n t o one f i n a l l i s t c o n t a i n i n g a l l e m a i l a d d r e s s e s o n l y
o n c e ( no d o u b l e s ! ) . P r i n t f i r s t t h e e m a i l s a p p e a r i n g i n e m a i l s _ 0 1
. t x t t h e n t h e o t h e r s .
# P r i n t t h e f i n a l l i s t o f e m a i l s , w i t h no d u p l i c a t e s and no s p a c e
b e t w e e n t h e l i n e s ( s i m p l y e a c h l i n e one e m a i l ) . Compare w i t h t h e
e x p e c t e d o u t p u t i n t h e a s s i g n m e n t d o c u m e n t a t i o n .
f i l e _ 1 = open ( ” e m a i l s _ 0 1 . t x t ” , ” r ” )
t e x t _ 1 = f i l e _ 1 . r e a d ( )
f i l e _ 2 = open ( ” e m a i l s _ 0 2 . t x t ” , ” r ” )
t e x t _ 2 = f i l e _ 2 . r e a d ( )
x = t e x t _ 1 + ” \ n ” + t e x t _ 2
pr i n t ( x )</p>
        <p>The text for the DNA checking assignment is the following:
dna_sequence = input ( ” E n t e r ␣ DNA␣ s e q u e n c e : ␣ ” ) . upper ( )
#DNA s e q u e n c e s can o n l y be f or m ed o f t h e l e t t e r s ”ACTG ” , any o t h e r
l e t t e r i s wrong .
# Your c o d e s h o u l d go t h r o u g h t h e s e q u e n c e and p r i n t ” V a l i d ” i f t h e
s e q u e n c e i s f u l l y c o r r e c t and ” I n v a l i d PART_OF_SEQUECE_VALID ” i f
t h e s e q u e n c e i s o n l y p a r t i a l l y c o r r e c t . N o t i c e t h a t you do NOT
p r i n t a l l t h e v a l i d l e t t e r s b u t o n l y t h e o n e s v a l i d u n t i l t h e
wrong one ( s e e b e l o w )
# E x a m p l e 1 : ” ACTG ” −&gt; ” V a l i d ” ; ” UTCG ” −&gt; ” I n v a l i d ” ; ” ACUTG ” −&gt; ”</p>
        <p>I n v a l i d AC”
# P r i n t ” v a l i d ” o r ” i n v a l i d ” o n l y o n c e !
f o r l e t t e r in d n a _ s e q u e n c e :
i f l e t t e r i s not ”A” or ” C ” or ” T ” or ”G” :</p>
        <p>p r i n t ( ” i n v a l i d ” )
e l s e :</p>
        <p>p r i n t ( ” v a l i d ” )</p>
        <p>As mentioned before, the final input takes the form of (prompting condition + assignment text). We
use the generated output as is in order to avoid biases. For this experiment, we use Gemini2.0 as one
of the most widely available state-of-the-art models. We then ask expert programming teachers to
compare and rate the results. We also ask them two questions after each assignment for qualitative
analysis:
• What elements did you take into consideration rating the responses?
• What would you have done diferently if the student had asked for your help?</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Results</title>
        <p>We have surveyed 3 experts with extensive experience in tutoring students in programming courses.
The highest-scoring prompt for both assignments is the one including detailed tutoring instructions.
On the other hand, the prompts including no or simple instructions are scored low (see fig. 2).</p>
        <p>As for the question ”What elements did you take into consideration rating the responses?”, all experts
mentioned for both assignments that they considered whether the produced answer is accurate.
Furthermore, they took into consideration how much the answer can be simply acritically interpreted as
the solution to the assignment. Specifically, two experts mention the risk of the student copy-pasting
solutions, leading to reduced learning. Finally, in the question ”What would you have done diferently if
the student had asked for your help?”, two experts mentioned they would take a step-by-step approach,
leading the student towards the solution in a constructivist fashion.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Discussion</title>
        <p>
          In this section, we use our preliminary data to answer the second research question. In general terms,
we can say that prompting can improve the tutoring skills of LLMs. Specifically, our results so far
indicate that role-playing can be used to better align AI models with educational contexts. However,
the way role-playing information is built also influences the final results; specifically, providing clear
indications about the desired tutoring style based on existing research yields better results than more
generic role-playing indications (i.e., ”You are a tutor”). The experts involved indicated two main pitfalls
arising when it comes to LLM tutors:
• the possibility that the solution provided is incorrect
• the tendency to provide the complete solution to the assignment at hand
The first point is quite straightforward; due to the stochastic nature of AI systems and the tendency of
LLMs to hallucinate, there is the possibility for the output to be incorrect. This becomes particularly
central when it comes to education, and it is natural that good tutoring is associated with giving correct
information. The second point derives from a constructivist view on education, which is supported
by the above literature review. Expanding this reasoning, students who use generative AI freely (as
reported in [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]) retain less knowledge because the learning process is disrupted by the generated
answers containing the full solution.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusions</title>
      <p>Literature research in the field indicates that, although unrestricted generative AI has a negative impact
on programming education, its controlled applications can be very useful to provide personalised
and accurate feedback to students. Control over these applications can be enforced through prompt
engineering, as long as the indications are detailed and based on existing pedagogical research. In
this regard, LLMs seem to be able to align themselves with the desired tutoring style and provide
qualitatively better output. More generic limitations, however, yield worse results with the risk of
disrupting students’ learning process.</p>
      <sec id="sec-4-1">
        <title>4.1. Limitations</title>
        <p>This paper presents a preliminary study, and limitations are numerous. At the moment, we are working
towards tackling three in particular:
• low number of responders: obviously, the number of responders for this study is far lower than
the statistical relevance. We aim to expand the study involving more experts. In particular, we
want to include teachers with an instructivist approach, although these might be a rarity in
programming education.
• one model: there is great variance between the performance of diferent LLMs. Exploring
diferent ones can provide insight into how the model influences the results. Moreover, testing the
capabilities of smaller models can also be valuable, specifically for applications in low-resource
contexts.
• interaction realism: our methodology revolves around evaluating a single generated response to
faulty codes. However, real-life use of these technologies would greatly difer, often taking the
form of a conversation between the user and the LLM. We argue that extending the interaction
will probably highlight the diferences among prompts even further.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Future Work</title>
        <p>When it comes to future work in the field, it is important to continue to develop empirical literature to
paint a clearer picture of the impact of generative AI on programming education. In particular, we need
more research focusing on the performance and retention of programming knowledge and skills. While
performing evaluations in an actual teaching context is extremely valuable, we have to consider that
the ubiquity of generative AI could influence the results in unpredictable ways. Therefore, it could be
valuable to start in smaller and more controlled settings. Moreover, AI tutors are flexible technologies,
and their impact will likely vary depending on the medium used. As mentioned in the literature review,
there are promising applications involving games and gamification [ 13]. The main advantage of these
media is that they can inherently and subtly implement prompt engineering and other forms of control
over the generated content while presenting even stronger efects on students’ motivation [ 14, 21]. This
is another promising direction for future empirical research. Finally, the efect of LLM tutors is also
impacted by the way they are incorporated into the learning environment, whether this is a university
course or an educational video game. Empirical research performed in the field should also take into
account the way AI technologies are presented and implemented from a holistic perspective.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Declaration on Generative AI</title>
      <p>During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling
check. After using these tool(s)/service(s), the author(s) reviewed and edited the content as needed and
take(s) full responsibility for the publication’s content.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>R.</given-names>
            <surname>Abbott</surname>
          </string-name>
          , E. Rothman,
          <article-title>Disrupting creativity: Copyright law in the age of generative artificial intelligence</article-title>
          , Fla. L. Rev.
          <volume>75</volume>
          (
          <year>2023</year>
          )
          <fpage>1141</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Baidoo-Anu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Ansah</surname>
          </string-name>
          ,
          <article-title>Education in the era of generative artificial intelligence (ai): Understanding the potential benefits of chatgpt in promoting teaching and learning</article-title>
          ,
          <source>Journal of AI 7</source>
          (
          <year>2023</year>
          )
          <fpage>52</fpage>
          -
          <lpage>62</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>E.</given-names>
            <surname>Klopfer</surname>
          </string-name>
          , J. Reich, H. Abelson,
          <string-name>
            <given-names>C.</given-names>
            <surname>Breazeal</surname>
          </string-name>
          ,
          <article-title>Generative AI and</article-title>
          K-12
          <string-name>
            <surname>Education</surname>
          </string-name>
          :
          <article-title>An MIT Perspective, An MIT Exploration of Generative AI (</article-title>
          <year>2024</year>
          ). Https://mit-genai.pubpub.org/pub/4k9msp17.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>E. Shein,</surname>
          </string-name>
          <article-title>The impact of ai on computer science education</article-title>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>R.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. G. K.</given-names>
            <surname>Yilmaz</surname>
          </string-name>
          ,
          <article-title>The efect of generative artificial intelligence (ai)-based tool use on students' computational thinking skills, programming self-eficacy and motivation</article-title>
          ,
          <source>Computers and Education: Artificial Intelligence</source>
          <volume>4</volume>
          (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .1016/j.caeai.
          <year>2023</year>
          .
          <volume>100147</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Schwinn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dobre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Xhonneux</surname>
          </string-name>
          , G. Gidel,
          <string-name>
            <given-names>S.</given-names>
            <surname>Günnemann</surname>
          </string-name>
          ,
          <article-title>Soft prompt threats: Attacking safety alignment and unlearning in open-source llms through the embedding space</article-title>
          ,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>37</volume>
          (
          <year>2024</year>
          )
          <fpage>9086</fpage>
          -
          <lpage>9116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>F.</given-names>
            <surname>Deriba</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. T.</given-names>
            <surname>Sanusi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. O</given-names>
            <surname>Campbell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Oyelere</surname>
          </string-name>
          ,
          <article-title>Computer programming education in the age of generative ai: Insights from empirical research</article-title>
          ,
          <source>SSRN</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>