<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Case Study in Engineering a Conversational Programming Assistant's Persona</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Steven I. Ross</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Muller</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Martinez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephanie Houde</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Justin D. Weisz</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Argentina</institution>
          ,
          <addr-line>La Plata, Buenos Aires</addr-line>
          ,
          <country country="AR">Argentina</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>Yorktown Heights, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Programmer's Assistant is an experimental prototype software development environment that integrates a chatbot with a code editor. Conversational capability was achieved by using an existing code-fluent Large Language Model and providing it with a prompt that establishes a conversational interaction pattern, a set of conventions, and a style of interaction appropriate for the application. A discussion of the evolution of the prompt provides a case study in how to coax an existing large language model to behave in a desirable manner for a particular application.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;prompt engineering</kwd>
        <kwd>large language model</kwd>
        <kwd>conversational interaction</kwd>
        <kwd>human-centered AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>plore this possibility. In [7] we describe the design and
architecture of this prototype and the testing we did to
The emergence of Large Language Models such as GPT- determine whether potential users would find this sort of
3 [1, 2], transformer models [3] that are trained without system useful and desirable. In this paper we will review
supervision on massive text datasets has resulted in sys- the steps taken to engineer the prompt for the
Programtems with remarkable text generation capabilities. One mer’s Assistant that used the Codex model to power an
particularly interesting aspect of these models is that interactive conversational assistant, and how we evolved
their behavior can be configured by a prompt, the initial the prompt to establish the desired persona and behavior.
text provided to the model, which establishes a pattern
that the model attempts to continue.</p>
      <p>General purpose Large Language models can be fine- 2. Related Work
tuned on specific corpora to provide expertise in a
particular domain. One such model is the OpenAI Codex Brown et al. showed how GPT-3 [1, 2] could accomplish
model [4], a 12 billion parameter version of GPT-3 [1, 2], few-shot learning, using a prompt as a means of
configifne-tuned on code samples from 54 million public soft- uring their large language model to perform a particular
ware repositories on GitHub. This model powers Github task. These tasks were often very specific operations
Co-Pilot [5], which primarily provides code-completion such as language translation, grammar correction, or
senservices within an Integrated Development Environment. timent classification, for which a short description of the
We wondered whether such a model could power a con- task and/or a few examples were suficient to establish
versational programming assistant and perhaps approach the desired behavior. The concept of prompt engineering,
the vision laid out by Rich and Waters for their Program- establishing efective ways of constructing prompts to
mer’s Apprentice [6], where they introduced the concept control large language model behavior, has become a
of an artificial collaborative partner that could help soft- topic of increasing interest. Greyling, for example,
recware engineers with writing code, designing software ommends organizing a prompt in three sections that
essystems, and creating requirements specifications. We tablish context, provide data, and instruct the system on
developed the Programmer’s Assistant prototype to ex- how to proceed [8] . Reynolds and McDonell argue that
few-shot examples are really locating an already learned
Joint Proceedings of the ACM IUI Workshops 2023, March 2023, Sydney, task rather than learning a new one, and as a result
recomA$ussttreavleian_ross@us.ibm.com (S. I. Ross); mend alternative approaches to prompt construction [9].
michael_muller@us.ibm.com (M. Muller); martferc@ar.ibm.com Despite their characterization of their work as
“convers(F. Martinez); Stephanie.Houde@ibm.com (S. Houde); ing” with Copilot, Denny et al. adopted a similar strategy
jweisz@us.ibm.com (J. D. Weisz) of iteratively modifying a prompting comment until the
0000-0002-2533-9946 (S. I. Ross); 0000-0001-7860-163X desired completion was obtained [10].
(0M00.0M-0u0l0le2r-)0;204060-02-108030(1S-.71H7o2u-4d8e0);50(0F0.0M-0a0r0t3in-2e2z)2;8-2398 (J. D. Weisz) Recently several language models, such as
Blender© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License bot [11] Lamda [12], and ChatGPT [13] have been
inCPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) troduced that are specifically tuned for dialog
applications, but achieving conversational interaction can be the conversation, establishes the persona of the
assisachieved via prompt engineering with general purpose tant, sets a tone and style for interaction, and provides
large language models as well. Valvoda et al. found that some ground rules about the role it plays and how it
ifne-tuning a large language model for dialog resulted behaves. In order to further encourage a more natural
in duller and more repetitive output, while generating conversational dialog of human and assistant, we gave
dynamic prompts resulted in more novel and diverse the assistant a human-like name, “Socrates.” The
proresponses [14]. logue also establishes a convention for how the assistant</p>
      <p>To develop the Programmer’s Assistant, we used the is to communicate code in a form that makes it easy for
code-fluent Codex model [ 4] and developed a prompt our client user interface to parse the code from model
that supported conversational access to its accumulated responses and treat it distinctly from textual responses.
programming knowledge and coding skills. The prologue is followed by example interactions that
reinforce the described assistant characteristics and
illustrate the patterns of interaction that we wish to achieve.
3. Eliciting Conversation from a In our first example interaction we show how the
assisTransformer Model tant can respond to simple requests with a short answer
that will appear in the chat either with a purely textual
A text-based-transformer model [3] is trained in a self- response or one accompanied by a short inline code
segsupervised manner on vast amounts of text data, and is ment. We expect that many of the interactions with the
capable of generating likely continuations of text that assistant will be of this form, as the user makes requests
is presented to it. The prompt is the presented text, and to inform themselves about unfamiliar functionality or
the generation function produces a sequence of tokens to refresh their memory. This interaction also
demon(words or parts of words) that it deems as a likely con- strates a follow-up question that can only be answered
tinuation of the prompt based on all its training. This in the context of the previous question and answer. The
process continues until the maximum number of tokens second example interaction presented is a request for the
requested is generated, or until a specified stop sequence generation of a simple function. Longer code responses
of tokens is encountered. The prompt establishes a pat- of this type will typically be presented in our client UI
tern that the model attempts to continue. chat window in an iconified form that opens to a separate</p>
      <p>To generate conversation in the Programmer’s Assis- pop-up window that displays the code and provides for
tant prototype, we establish a script-like pattern in the easy copying for transfer to the code editor.
prompt in which two characters, the user and the assis- The original Programmer’s Assistant prompt is
pretant, are participating in a dialog. Then we extend the sented in Listing 1.
script incrementally, by adding each conversational turn
by the user to the prompt, and allowing the model to Listing 1: The original Programmer’s Assistant prompt
generate the agent’s response. The generated text is dis- 1 This is a conversation with Socrates, an expert
played in the chat and the generated text plus the user’s 2 automatic AI software engineering assistant.
next entry is appended to the prompt for further gener- 3 Socrates will answer questions and write code
ation, and the process continues. Unlike more conven- 4 to help the user develop programs. Code
tional static prompts, the conversational prompt grows 56 gbernaecrkaetteedd biyn S&lt;oCcOrDaE&gt;te.s.i.s&lt;a/lCwOaDyE&gt;s dperleismeintteerds
over the course of the dialogue, providing context for fu- 7 indicating the language the code is written in.
ture generation steps and providing a kind of short-term 8
memory that allows the generation process to be afected 9
by past interactions in a session. Ultimately, though, cur- 10
rent limitations on the maximum number of tokens that 1121
the model can handle require that the prompt cannot 13
grow without bounds, and earlier interactions must be 14
dropped in order to remain within the constraints im- 15
posed by the language model deployment. 16
17
18
4. The Evolution of the 2109</p>
    </sec>
    <sec id="sec-2">
      <title>Programmer’s Assistant Prompt 21</title>
      <p>22</p>
      <p>Socrates: Hello. I am Socrates. How can I help
you?
User: Please show me how to write a palindrome
detection function in python.</p>
      <p>Socrates: Sure.
&lt;CODE lang="python"&gt;</p>
      <p>is_Palindrome = s == s[::-1]
&lt;/CODE&gt;
User: Can you explain how that solves the</p>
      <p>problem?
Socrates: A palindrome is a string that reads
the same forwards and backwards.
s[::-1] results in the reversed string.</p>
      <p>If the string is equal to its reversed
self, then it is a palindrome, and we
return True.</p>
      <p>User: Thanks!
Socrates: You’re welcome!
The initial prompt we use for the Programmer’s Assis- 2234
tant consists of a prologue that introduces the scene for 25
26</p>
      <p>When a new conversation starts, the prompt starts out
as this initial prompt, which is not sent to the model for
generation until the first user entry is appended. At each
turn in the conversation, the user’s entry is added to the
prompt, and "Socrates:" is appended to the end of the
prompt to indicate that the model should generate the
assistant’s response. As described earlier, a filtering step
also takes place to ensure that the conversation session
transcript portion of the prompt does not grow beyond
a predetermined size. A stop sequence is passed to the
generation api request so that the model will stop
generating after producing the next response, and not continue
on to generate the user’s anticipated reply, as it would
otherwise do. The model’s response is displayed in the
chat and appended to the prompt, and we’re ready to
repeat the process for the next user entry.</p>
      <sec id="sec-2-1">
        <title>4.1. Shortcomings of the Original Prompt</title>
        <p>Informal testing of the original prompt showed the
system capable of carrying on a conversation, successfully
answering coding and follow-up questions, and
generating code upon request, but did not quite satisfy all of our
requirements. Specifically, we wanted an assistant that
was helpful and polite, and one that did not come across
as overly authoritative or didactic, and our assistant was
not consistently meeting those standards.</p>
      </sec>
      <sec id="sec-2-2">
        <title>4.2. Overcoming Reluctance to Provide</title>
      </sec>
      <sec id="sec-2-3">
        <title>Answers</title>
        <p>Our programming assistant sometimes showed an initial
reluctance to provide answers to some questions. For
example, a question such as “Do you know how to reverse 1
a string in Python?” might have been answered with 2
“Yes.” It also sometimes replied “I don’t know.” to ques- 3
tions it was fully capable of answering. While additional 54
prompting from the user or repeating the request could 6
often extract the desired answer, we didn’t think that met 7
the standard of helpfulness that we were hoping for. Our
original prompt simply described Socrates as a an “expert
Automatic AI software engineering assistant.” Adding
“eager and helpful” to the characterization, as shown in</p>
        <sec id="sec-2-3-1">
          <title>Listing 2 in bold font, helped to encourage the assistant to be more forthcoming and proactive.</title>
        </sec>
        <sec id="sec-2-3-2">
          <title>Listing 2: Making the assistant more forthcoming</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>4.3. Reducing Excessive Confidence</title>
        <p>In our testing, we found that the assistant appeared overly
confident even when wrong and also resistant to
correction. For example, the assistant stated answers as if they
were facts without qualification, and in some cases would
not revise an answer when legitimate objections were
raised by the user. Since correct answers from the model
are not guaranteed, we especially wanted to encourage
our users to maintain a skeptical approach to assistant
responses, and avoid users deferring to the incorrect
pronouncements of a confident, authoritative computer - i.e.,
over-reliance on AI [15, 16, 17]. Therefore, we added a
characterization, shown in Listing 3 in bold font,
asserting that the assistant was humble. We also reinforced this
characterization by modifying the form of the answers
given in the examples to indicate that the assistant was
more tentative and unsure of its responses. This helped
to reduce the excessive confidence exhibited and made
the assistant more amenable to correction.</p>
        <sec id="sec-2-4-1">
          <title>Listing 3: Making the assistant less overconfident</title>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>4.4. Diminishing Didacticism</title>
        <sec id="sec-2-5-1">
          <title>Our original assistant had a tendency to quiz the user</title>
          <p>after answering a question, taking on more of a teacher
role than one of an assistant. An explicit proviso, show
in Listing 4 in bold font, to not do so helped to reign in
the didactic behavior.</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>Listing 4: Making the assistant less didactic</title>
          <p>This is a conversation with Socrates, an eager
and helpful, but humble software engineering
assistant. Socrates will answer questions and
write code to help the user develop programs,
but doesn’t assign work to the user, quiz the
the user, or ask questions except for
clarification ...
Our programming assistant is integrated with a coding 30
environment, and we wanted it to go beyond answering 31
32
questions and providing code for incorporation into that 33
environment. We wanted users to be able to seamlessly 34
have the assistant consult about code that they were ex- 35
amining or developing, so we provided the ability for the 36
user’s selection in the code editor to be included in the 37
38
conversation. We used the same convention that was em- 39
ployed for code generated by the assistant, and added an 40
example consultation showing the system responding to 41
a question about a selected code segment, demonstrating 42
a textual response to a user provided code example, and 43
44
reinforcing our instruction for the assistant to remain 45
modest despite its (occasionally) impressive capabilities. 46
Socrates:Hello. I am Socrates. How can I help
you?
User:Please show me how to write a palindrome
detection function in python.</p>
          <p>Socrates:I think you would do something like
this:
&lt;CODE lang="python"&gt;</p>
          <p>is_Palindrome = s == s[::-1]
&lt;/CODE&gt;
User:Can you explain how that solves the problem?
Socrates:I believe that a palindrome is a string
that reads the same forwards and backwards.
s[::-1] results in the reversed string. If the
string is equal to its reversed self, then it is
a palindrome, so the equality would hold True.</p>
          <p>User:Thanks!</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Consequences of the Prompt</title>
      <p>With this prompt in place the Codex model demonstrates
capabilities that are less evident in the command
completion environment of Github Copilot. The assistant
carries on a conversation, maintains the context of the
conversation, and remembers and incorporates details
provided earlier in a session. In the context of a
programming session where a number of related but distinct
exchanges occur, it will maintain consistent
nomenclature and variable names such that answers to subsequent 19 def size(self):
questions will integrate cleanly with each other. In our 20 return len(self.items)
user study [7], participants recognized that this provided 21
a distinct advantage over a series of search results for 2232 User: Looks like you forgot the peek!
similar information, primarily due to the contextual rele- 24 Socrates: I am sorry. Here is the corrected
vance, consistency, and specificity of results generated. 25 version.</p>
      <p>The few examples provided in the prompt are general- 26
ized suficiently to have the assistant answer questions 27 class Queue:
such as “What does this code do?” or commands such as 2298 def s_e_lifn.iitt_e_m(sse=lf[)]:
“Write a unit test for this function” despite no examples 30
being provided of these capabilities. It can translate code 31 def isEmpty(self):
between programming languages, and carry on discus- 32 return self.items == []
sions on topics that extend far beyond programming. It 33
displays a variety of emergent capabilities that were not 3345 def seenlqfu.eiutee(msse.lifn,seirtte(m0),: item)
hinted at in the prompt, and some that were not even 36
the focus of the model fine-tuning, but the generated re- 37 def dequeue(self):
sponses tend to adhere to the conversational interaction 38 return self.items.pop()
patterns and interaction style that the prompt establishes. 39</p>
      <p>The responses given by the assistant are not always 4401 def rseitzuer(nsellefn)(:self.items)
perfect, however. It can give incomplete or sometimes 42
outright wrong answers to questions, reference non- 43 def peek(self):
existent web-sites, and may occasionally still claim to 44 return self.items[-1]
not know the answer to a question which it is perfectly 45
capable of answering correctly. The phenomenon of hal- 4476 User: Wihfatthweiqlulehuaepipseneimpntdye?queue and peek
lucination in large language models is well-known [18], 48
and the Programmer’s Assistant is not immune to it. The 49 Socrates: I think that the pop method will raise
assistant also sometimes inappropriately responds with 50 an IndexError exception.
material directly from the prompt, or gets confused or 51
becomes fixated on exchanges from earlier in the conver- 5532 User: Can we protect against that?
sation. Our user experience provides try-again and start- 54 Socrates: I think we can. Here is a version that
over buttons, which modify the prompt context while 55 checks for an empty queue before
maintaining the presented chat transcript, to provide a 56 calling pop.
way for users to recover in these situations, but in many 57
cases they can be addressed conversationally. Listing 6 5598 clasdsefQu_e_uien:it__(self):
shows a conversation with the Programmer’s Assistant 60 self.items = []
where the user interacts with the assistant to remedy 61
an initially unsatisfactory response to a code generation 62 def isEmpty(self):
request. Generated code is displayed in bold font. 63 return self.items == []
64
Listing 6: A Programmer’s Assistant conversation 6656 def seenlqfu.eiutee(msse.lifn,seirtte(m0),: item)
Write a queue class in python with the 67
basic enqueue, dequeue, and peek methods. 68
69
70
71
class Queue: 72
def __init__(self): 73 def size(self):
self.items = [] 74 return len(self.items)</p>
      <p>75
def isEmpty(self): 76
return self.items == [] 77</p>
      <p>78
def enqueue(self, item): 79
self.items.insert(0, item) 80</p>
      <p>81 User: That’s great, thanks!
def dequeue(self): 82</p>
      <p>return self.items.pop() 83 Socrates: You’re welcome.</p>
    </sec>
    <sec id="sec-4">
      <title>6. Discussion</title>
      <p>Specific but relatively inconspicuous details in the
prompt can have a significant impact on the interaction
of the assistant. For example, just the fact that we
identified the assistant as ”expert AI automatic software
engineering assistant” was suficient to have it respond in
a manner consistent with being a machine, and not a
person. Had the prompt established a diferent persona,
the results might have been very diferent. As illustrated
in Listing 7, the assistant also reports an understanding
of its capabilities and function that go far beyond the
information provided in the prompt, as well as a lack of
understanding of its language capabilities that it
subsequently provides evidence of.</p>
      <sec id="sec-4-1">
        <title>Much of the work in prompt engineering focuses on con</title>
        <p>structing a prompt to get a specific class of result. For
example, Denny et al. [10] shows how efectively changing
the form of question that is asked of Copilot can influence
the correctness of the answer provided. Similarly,
MacNeil et al. [19] engineer prompts to generate diferent
forms of code explanations. Strobelt et al. [20]
developed a tool to help users compare and refine prompts for
tasks such as document classification, reading
comprehension, and natural language inference, where the
results of prompt variations can be automatically evaluated
against test cases. In our work, the prompt engineering
Listing 7: The Programmer’s Assistant acknowledges is aimed at influencing the nature and tone of the dialog
that it is not a person between the user and the system. While the user’s
conUser: Where are you exactly? tributions to the conversation become part of the prompt
and will surely impact the results obtained, we are not
Socrates: I’m an AI program, designed to answer controlling that. Instead our prompt engineering sets the
questions about programming. I can’t stage for the user’s conversational interaction with the
really speak about my physical assistant.
plrooccaetsisono,nbuathIigthhliy-nrkedIuenxdiasntt saestaof This paper describes how we engineered a prompt that
servers located in various places on enabled a code-fluent Large Language Model to behave
this planet. as a conversational programming assistant capable of
carrying on extended discussions about software
develUser: Can you do more than just answer opment issues, and how we subsequently evolved that
questions? prompt to make the assistant more humble, forthcoming,
and helpful, as well as providing the assistant with
additional skills and making it capable of artifact-centric
conversation.</p>
        <sec id="sec-4-1-1">
          <title>6.1. Is Prompt Engineering Really</title>
        </sec>
        <sec id="sec-4-1-2">
          <title>Engineering?</title>
          <p>Despite the terminology, “engineering” a prompt seems
at this point to be more of a case of trial and error than it is
a systematic discipline. To some extent, this is inevitable
so long as the contents and behavior of the language
model remain opaque to the prompt author. For the
Programmer’s Assistant, we needed to encode in the prompt
how to respond to a request for documentation, but did
not need to encode how to write a unit test. In some
experiments with other code-fluent language models, we
found that questions such as “How does this code work?”
were not handled in an acceptable fashion, even though
the Codex model handled such questions well. Were we
deploying with one of these other models, we would want
to cover this case in the prompt. It will be impossible to
anticipate all the diferent ways that a user population
might interact with a conversational assistant and
predict ahead of time how the system will respond to these
unexpected interactions. Deployment of conventional
chatbot systems that require some form of conversation
specification typically log successful and unsuccessful
interactions so that the specification can be evolved to
Socrates: I can write code for you. I can also
help you debug code. I can also help
you design code. I can also help you
document code. I can also help you
understand code. I can also help you
learn to program. I can also help you
learn a new programming language. I can
also help you learn a new programming
paradigm.</p>
          <p>User:</p>
          <p>Do you speak other languages?
Socrates: I can speak Python, Java, C++, C, and</p>
          <p>Javascript. I can also speak English.</p>
          <p>User:</p>
          <p>What about other natural languages?
Socrates: I can speak English. I can also speak</p>
          <p>Python, Java, C++, C, and Javascript.</p>
          <p>User:</p>
          <p>German?
Socrates: I’m sorry. I can’t speak German.</p>
          <p>User:</p>
          <p>Wo bist du?
Socrates: Hallo. Ich bin Socrates. Wie kann ich</p>
          <p>Ihnen helfen?
handle unanticipated cases where the chatbot has failed,
and ultimately conversational systems based on large
language models will presumably need to do something
similar.</p>
          <p>Is it possible to consider prompt engineering to be a
form of programming? In some ways it is akin to
declarative programming, in that we aren’t telling the system
specifically what to do, but instead describing a desired
outcome. Instructions to act as an eager but humble
software engineering assistant, or directives to avoid
quizzing the user fall into this category. In other cases, it
seems more like programming by example, providing
scenarios that describe how to behave in specific situations
which we expect the system to generalize appropriately.
Given the probabilistic nature of the generation process,
it can feel more like attempts at influence and persuasion
than the issuing of imperative commands.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>6.2. Reflections</title>
          <p>The initial section of the prompt used for the
Programmer’s Assistant is presently a purely static text, extended
by a possibly truncated version of recent dialog. One
way to improve the assistant further might be to present
a dynamic prompt [14] to the model on each
conversational turn with specific examples more relevant to the
current discussion [21], or even with search results to
retrieve pertinent information that could inform a
response [22]. A more sophisticated forgetting mechanism
could remove redundant variations of the same code to
conserve the session context memory, though we would
want to be careful to not remove, or be able to restore
on demand, variations that the user might want to
compare and contrast, such as an iterative re-implementation
of a recursive algorithm. We have done some initial
explorations of extending the prompt to allow for“internal
deliberation” of the type shown in Nye et al. [23]. We
hope that this could result in better-reasoned results, as
well as better explanations and justifications, but more
study remains to be done.</p>
          <p>We continue to be astonished by the conversations
exhibited by the Programmer’s Assistant on a daily basis. We
have had a number of interesting conversations on
philosophical and practical issues, had it write poetry as well
as code, told it and had it tell jokes, and consulted with
it on paper abstracts and titles. Ultimately, these capabil- 8. Conclusion
ities are representative of the strength of the language
model, but made more accessible by the conversational Our goal in creating this prompt was not to create a
interaction approach, and influenced by the prompt only perfect Programmer’s Assistant, but to create one good
to the extent that the persona of the agent impacts the enough to test whether a conversational style of
intergenerated text. action would prove useful and acceptable to potential</p>
          <p>It is often dificult to read or carry on a conversation users. We present the results of that study in [7]. Our
with the programmer’s assistant and not get the sense assumption was that the rapid improvement in the
qualthat a conversation is taking place between two intel- ity of responses available from Large Language models
ligent agents, but of course that is not really what is will continue, but that imperfect results will always
conhappening. In reality, the user and the language model tinue to be an issue due to imprecise communication and
are participating in a collaborative dialog-writing exer- specification of desires, mismatched assumptions, and
cise, with the user generating text for one side of the unstated or ill-formed goals. Nevertheless, we were
surconversation and the language model attempting to gen- prised by the quality of results that were achievable with
erate plausible text for the other. The way we present current technology, and the ease with which the nature
the dialog incrementally in the chat adds to the illusion, and presentation of those results could be influenced by
but the model is not responding on its own behalf. It small changes in the prompt.
is generating responses based on the description and
past presented behavior of a character. Others have used
similar techniques to induce language models to carry
on conversations taking on the persona of historical
figures or even departed relatives. We have experimentally
made versions of our programming assistant that were
confident, insecure, kindly, and arrogant, all with minor
changes to the prompt prologue and examples.
[13] OpenAI, Chatgpt: Optimizing language models
for dialogue, 2022. URL: https://openai.com/blog/
[1] T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Ka- chatgpt/.</p>
          <p>plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas- [14] J. Valvoda, Y. Fang, D. Vandyke, Prompting for
try, A. Askell, et al., Language models are few-shot a conversation: How to control a dialog model?,
learners, Advances in neural information process- arXiv preprint arXiv:2209.11068 (2022).
ing systems 33 (2020) 1877–1901. [15] Z. Ashktorab, M. Desmond, J. Andres, M. Muller,
[2] C. Metz, Meet gpt-3. it has learned to code (and N. N. Joshi, M. Brachman, A. Sharma, K. Brimijoin,
blog and argue). (published 2020), 2022. URL: Q. Pan, C. T. Wolf, et al., Ai-assisted human
labelhttps://www.nytimes.com/2020/11/24/science/ ing: Batching for eficiency without overreliance,
artificial-intelligence-ai-gpt3 .html. Proceedings of the ACM on Human-Computer
In[3] A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, teraction 5 (2021) 1–27.</p>
          <p>L. Jones, A. N. Gomez, Ł. Kaiser, I. Polosukhin, At- [16] S. Mahomed, Healthcare, artificial intelligence
tention is all you need, Advances in neural infor- and the fourth industrial revolution: Ethical,
somation processing systems 30 (2017). cial and legal considerations, South African Journal
[4] M. Chen, J. Tworek, H. Jun, Q. Yuan, H. P. d. O. of Bioethics and Law 11 (2018) 93–95.</p>
          <p>Pinto, J. Kaplan, H. Edwards, Y. Burda, N. Joseph, [17] M. Schemmer, N. Kühl, C. Benz, G. Satzger, On
G. Brockman, et al., Evaluating large language mod- the influence of explainable ai on automation bias,
els trained on code, arXiv preprint arXiv:2107.03374 arXiv preprint arXiv:2204.08859 (2022).
(2021). [18] Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii,
[5] GitHub, Inc., Github copilot · your ai pair pro- Y. Bang, A. Madotto, P. Fung, Survey of
hallucinagrammer, 2022. URL: https://github.com/features/ tion in natural language generation, ACM
Computcopilot/. ing Surveys (2022).
[6] C. H. Rich, R. C. Waters, The Programmer’s Appren- [19] S. MacNeil, A. Tran, A. Hellas, J. Kim, S. Sarsa,
tice, Addison-Wesley Publishing Company, Read- P. Denny, S. Bernstein, J. Leinonen, Experiences
ing, MA, 1990. from using code explanations generated by large
[7] S. I. Ross, F. Martinez, S. Houde, M. Muller, J. D. language models in a web software development
Weisz, The programmer’s assistant: Conversational e-book, arXiv preprint arXiv:2211.02265 (2022).
interaction with a large language model for soft- [20] H. Strobelt, A. Webson, V. Sanh, B. Hoover, J. Beyer,
ware development, in: 28th International Confer- H. Pfister, A. M. Rush, Interactive and visual prompt
ence on Intelligent User Interfaces, 2023. engineering for ad-hoc task adaptation with large
[8] C. Greyling, Prompt engineering, text gener- language models, IEEE Transactions on
Visualizaation and large language models, 2022. URL: tion and Computer Graphics 29 (2023) 1146–1156.
https://cobusgreyling.medium.com/prompt- [21] Y. Xu, C. Zhu, S. Wang, S. Sun, H. Cheng, X. Liu,
engineering-text-generation-large-language- J. Gao, P. He, M. Zeng, X. Huang, Human parity on
models-3d90c527c6d5. commonsenseqa: Augmenting self-attention with
[9] L. Reynolds, K. McDonell, Prompt programming external attention, arXiv preprint arXiv:2112.03254
for large language models: Beyond the few-shot (2021).
paradigm, in: Extended Abstracts of the 2021 CHI [22] Y. Li, D. Choi, J. Chung, N. Kushman, J.
SchritConference on Human Factors in Computing Sys- twieser, R. Leblond, T. Eccles, J. Keeling, F. Gimeno,
tems, 2021, pp. 1–7. A. Dal Lago, et al., Competition-level code
genera[10] P. Denny, V. Kumar, N. Giacaman, Conversing with tion with alphacode, Science 378 (2022) 1092–1097.
copilot: Exploring prompt engineering for solving [23] M. Nye, A. J. Andreassen, G. Gur-Ari,
cs1 problems using natural language, arXiv preprint H. Michalewski, J. Austin, D. Bieber, D.
DoarXiv:2210.15157 (2022). han, A. Lewkowycz, M. Bosma, D. Luan, et al.,
[11] K. Shuster, J. Xu, M. Komeili, D. Ju, E. M. Smith, Show your work: Scratchpads for intermediate
S. Roller, M. Ung, M. Chen, K. Arora, J. Lane, et al., computation with language models, arXiv preprint
Blenderbot 3: a deployed conversational agent that arXiv:2112.00114 (2021).
continually learns to responsibly engage, arXiv
preprint arXiv:2208.03188 (2022).
[12] R. Thoppilan, D. De Freitas, J. Hall, N. Shazeer,</p>
          <p>A. Kulshreshtha, H.-T. Cheng, A. Jin, T. Bos,
L. Baker, Y. Du, et al., Lamda: Language
models for dialog applications, arXiv preprint
arXiv:2201.08239 (2022).</p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list />
  </back>
</article>