<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>How Data Scientists Improve Generated Code Documentation in Jupyter Notebooks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Michael Muller</string-name>
          <email>michael1_muller@us.ibm.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>April Yi Wang</string-name>
          <email>aprilww@umich.edu</email>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Steven I. Ross</string-name>
          <email>slross@us.ibm.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Justin D. Weisz</string-name>
          <email>jweisz@us.ibm.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mayank Agarwal</string-name>
          <email>Mayank.Agarwal@ibm.com</email>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kartik Talamadupula</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stephanie Houdeg</string-name>
          <email>Stephanie.Houde@ibm.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fernando Martinezh</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>John Richardsi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jaimie Drozdalj</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xuye Liuk</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Piorkowskil</string-name>
          <email>david.piorkowski@ibm.com</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dakuo Wangm</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>Cambridge, MA 02142</addr-line>
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>Cambridge, MA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>IBM Research AI</institution>
          ,
          <addr-line>Yorktown Heights, NY</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Michigan</institution>
          ,
          <addr-line>Ann Arbor, MI</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Generative AI models are capable of creating high-fidelity outputs, sometimes indistinguishable from what could be produced by human efort. However, some domains possess an objective bar of quality, and the probabilistic nature of generative models suggests that there may be imperfections or flaws in their output. In software engineering, for example, code produced by a generative model may not compile, or it may contain bugs or logical errors. Various models of human-AI interaction, such as mixed-initiative user interfaces, suggest that human efort ought to be applied to a generative model's outputs in order to improve its quality. We report results from a controlled experiment in which data scientists used multiple models including a GNN-based generative model - to generate and subsequently edit documentation for data science code within Jupyter notebooks. In analyzing their edit-patterns, we discovered various ways that humans made improvements to the generated documentation, and speculate that such edit data could be used to train generative models to not only identify which parts of their output might require human attention, but also how those parts could be improved.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Code-documentation</kwd>
        <kwd>Generative AI</kwd>
        <kwd>Human-AI collaboration</kwd>
        <kwd>Jupyter notebooks</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <sec id="sec-1-1">
        <title>For several decades, scholars have explored how humans and computers might collaborate [1, 2, 3, 4, 5]. Early work largely focused on a zero-sum “trade-of” model in which a finite conceptual pool of “initiative” was to be</title>
        <p>
          split between human and computer. Typical approaches
asked, in efect, “who goes first?” and many models went
no further than a single cycle of
human-initiates-andAI-responds or AI-initiates-and-human-responds (e.g.,
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]).
        </p>
        <p>
          More recent work has deconstructed the older concept
of unitary “initiative” into a flexible and collaborative
framework in which increased initiative by one party
(e.g. human) does not imply a decrease of initiative by
the other (e.g. AI) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. In addition, the “mixed
initiative creative interface” (MICI) framework analyzed by
Deterding et al. [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] and Spoto and Oyelnik [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], further
developed by Muller et at. [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], specifically details how
human and AI partners interact in creative tasks as a
series of back-and-forth exchanges.
        </p>
        <p>
          In this paper, we examine how humans interact with
a generative AI model in the context of writing data
science documentation. We specifically aim to extend
the human-initiates-and-AI-responds interaction pattern reflected in code-centric use cases: Brockschmidt et al.
to include a step in which a human may make subsequent [39] proposed the use of generative models for source
edits to the outputs of the model. Prior work by our team code, and Tufano et al. [40] used generative models to
has explored how data scientists use various kinds of fix bugs. In this paper, we conduct a deeper analysis of
models – including a GNN-based generative model – to participant interactions with Wang et al. [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]’s Themisto
aid the task of adding documentation to data science documentation-generation system, which incorporates a
code in Jupyter notebooks [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. In this paper, we conduct GNN-based generative model for generating comments
a deeper analysis on the edits made by participants in from code.
that study, to understand the nature of their edit-patterns
and how they “compensate” or “augment” the output of 2.2. Human-AI Collaboration with
the generative models. Through a thematic analysis, we Generative Models in Data Science
developed a classification of participants’ edit-patterns.
        </p>
        <p>We found that 85% of participants’ edit-patterns fully and Software Engineering
accepted the algorithmically-generated text, or built upon In a recent study, Weisz et al. [41] examined the use of an
the generated text; and only 15% of the instances involved unsupervised neural machine translation (NMT) model
a complete rewrite. At first glance, these results suggest in addressing a task in application modernization,
specifthat the generated text was well-accepted. However, par- ically regarding translating code from a legacy language
ticipants modified the generated text in 41% of the cases. to a modern one. They found that software engineers
Thus, (1) a human requests the generation of text; (2) the would be tolerant of errors or mistakes in the output of an
AI provides that text; and then (3) the human makes a NMT model, as the code produced by that model would
decision to use - or not to use - that text, and (4) may go be subject to the same review and testing procedures as
on to edit the generated text. In the future, these three- code produced by everyone else on their team. In
adto-four dialogic ”moves” could become the basis for an dition, a code highlighting feature that indicated where
extended conversation between human AI. in the code human attention might be needed, based on</p>
        <p>In this paper, we develop a taxonomy of edit-patterns, an aggregation of the model’s token-level confidences,
discovering that some edits added missing details while was very desirable. Although a subsequent analysis by
other edits explained the function of the code. A third Agarwal et al. [42] demonstrated how current metrics
category of edits was primarly concerned with modifying of model confidence may not necessarily correlate with
the formatting or style of the documentation. external truth of code quality (approximated by lint
errors), the ability for a generative model to be able to “ask
2. Background for help” via code highlights was nonetheless highly
valued by software engineers. In this work, we push even
further by seeking to understand whether a generative
model can also specify what kind of it help it needs from
its human user.</p>
        <p>We discuss recent work in the area of AI and machine
learning applied to data science and software
engineering, as well as the application of generative models to this
domain. We also discuss recent studies on human
interactions with generative models in software engineering.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Method</title>
      <p>
        2.1. AI and Machine Learning in In order to understand the co-creation process of data
Software Engineering science documentation, we examined Themisto [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] a
prototype code documentation generation system that
supIn recent years, techniques from AI and machine learn- ports data scientists in writing documentation for
coming have been applied to various tasks in software engi- putational notebooks.1 Wang et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] conducted an
evalneering, including code completion [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref9">9, 10, 11, 12</xref>
        ], code uation of Themisto with 24 data science professionals. In
translation [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], code classification [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ], API recom- this section, we briefly discuss how Themisto generates
mendation [
        <xref ref-type="bibr" rid="ref16 ref17">16, 17</xref>
        ], variable and method naming [
        <xref ref-type="bibr" rid="ref18 ref19">18, 19</xref>
        ], documentation from code, as well as the user study setup
type inference [
        <xref ref-type="bibr" rid="ref20 ref21">20, 21</xref>
        ], bug detection and repair [22, 23, and data collection methodology.
24, 25, 26, 27], comment description and generation [28,
29, 30, 31, 32, 33], code change summarization [34], and
code clone detection [35]. Allamanis et al. [36] provides a
comprehensive review of the use of AI and machine learn- 1While Jupyter notebooks have been used in
educaing within data science and software engineering. The tion, we note that these notebooks are increasingly the
baemergence of generative AI techniques for natural lan- sis of commercial products [43, 44], as in oferings by IBM
guage, such as GPT-2 [37] and GPT-3 [38], have also been [https://developer.ibm.com/components/jupyter/] and Microsoft
[https://notebooks.azure.com/].
      </p>
      <p>A
3.1. Themisto: A System for Automatic pant was randomly given one of the two notebooks and
Documentation Generation asked to document the notebook within 12 minutes; two
participants failed to complete the task within that
timeWe implemented the automatic documentation gener- period, and were excluded from further analysis. Before
ation system as an extension to JupyterLab (Figure 1). the study, we provided a quick demo on the functionality
The extension generates three types of documentation of the extension.
for a given code snippet. The first type of
documentation is generated using a Graph Neural Network based
approach [45] which is commonly used in code summa- 3.3. Data Collection
rization tasks. The second type of documentation is gen- We collected the completed final notebooks (N=24) after
erated by retrieving relevant external API documentation participants finished the task. All study sessions were
for a code function (e.g., functions defined in Pandas 2, conducted remotely using a teleconferencing tool and
Numpy3, and Scikit-learn4). Lastly, the extension also we recorded participants’ screens with their permissions.
provides a prompt-based approach where users are given After the session, we conducted a retrospective interview
a short prompt to manually create the documentation. to ask about their experience and feedback.
For example, if a code cell contains a graphic output, We wanted to understand how participants used the
the extension would generate the prompt to ask users to algorithmically-generated documentation. For
orientainterpret the output. tion, we review that 13 participants made
documentation choices and optional modifications in each of nine
3.2. User Study Setup markdown cells in each of two notebooks ("Covid" and
"house") - i.e., 13 participants for each notebook. In the
We are interested in how users make revisions on the data from each notebook, we discovered two participants
suggested explanations. Thus, we recruited 26 data sci- (per notebook) who did not complete the task, or who
entists to add documentation to a given draft notebook created extra cells. We could not be certain how to "map"
using the prototype. We prepared two draft notebooks these extra cells to the structure that was common across
with the same length (9 code cells) and similar levels of the other 11 participants. Because we wanted to compare
dificulty, but for two diferent problems. Each partici- participants’ responses in a disciplined way, we treated
each of these people (who created extra cells) as outliers,
and excluded them from analysis. This exclusion left us
with 11 participants per notebook who had worked with</p>
      <sec id="sec-2-1">
        <title>2https://pandas.pydata.org/docs/reference/index.html 3https://numpy.org/doc/stable/reference/ 4https://scikit-learn.org/stable/modules/classes.html</title>
        <p>the algorithmically-generated documentation. this workshop paper, we present new analyses to examine
the edit-patterns in the 41% of the cells with
participant</p>
        <sec id="sec-2-1-1">
          <title>3.3.1. Preparatory Analysis edited documentation.</title>
          <p>There were few statistically significant diferences
beTo prepare the documentation for analysis, we grouped tween participant data from the two notebooks. We
all of the texts for each cell together (separately for each briefly report those analyses here, before the qualitative
notebook). We then used a bag-of-words method to analyses that are the core of this paper. Because we did
identify words (tokens) that were not included in the not find diferences between the two notebooks, we will
algorithmically-generated documentation. In the quo- then perform content analyses of participants’ text on
tations in this paper, the participant-introduced words the combined data from the two notebooks in Sections 5
appear in bracketed [blue-ink].5 Table 1 provides an illus- and 6.
trative example. This was a slightly conservative method
for identifying new text, because we might fail to detect
that a participant had typed "data" (for example) in their 4.1. Starting Points for Documentation
own usage, rather than including the word "data" from We provided three diferent Sources of documentation:
the algorithmically-generated text. However, we used AI, Query, and Prompt. A chi-square analysis found no
this method only to orient ourselves to the texts. significant diferences in the proportions of Sources
cho</p>
          <p>We next read each text by each participant. After read- sen by participants in each notebook. We also looked at
ings all of the texts, two researchers agreed upon a code- combinations of Sources - i.e., no discernable source vs.
book of edit-patterns. One researcher than applied that a single source vs. multiple sources. Again, a chi-square
codebook rigorously to all of the texts. analysis showed no significant diferences between the
notebooks.
3.3.2. Reference Notation
We identify each text in terms of the participant
number (1-26, with two non-completers and two outliers
excluded), the notebook ("covid" or "house"), and the cell
in the notebook (1-9). For example, "p21-house+4" refers
to participant 21, in the "house" notebook, in the 4th
markdown cell.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Results: High-Level</title>
    </sec>
    <sec id="sec-4">
      <title>Quantitative Comparisons</title>
      <p>Participants accepted the algorithmically-generated
documentation unchanged in 45% of the cells, and they edited
the algorithmically-generated documentation in 41% of
the cells. The remaining 9% of the cells were left blank. In</p>
      <sec id="sec-4-1">
        <title>5Readers who use a screenreader may want to consult</title>
        <p>https://doccenter.freedomscientific.com/doccenter/doccenter/
rs25c51746a0cc/2012-06-20_TextFormatting/02_TextFormatting.
htm for information on how to access font-attributes through
JAWS.</p>
        <sec id="sec-4-1-1">
          <title>4.2. Edit-Patterns</title>
          <p>We describe distinct Edit-Patterns below in Sections 5
and 6. Here, we briefly state that we used chi-square
tests to examine whether participants used diferent
editpatterns between the two notebooks. Only one of
editpatterns showed a significant diference, and that pattern
was concerned with the format (not the content) of the
documentation (i.e., levels of headers in the markdown
cell).</p>
          <p>Similarly to the "combinations" analysis of Section 4.1,
we found no significant diferences across notebooks for
cells with zero edit-patterns, a single edit-pattern, or
multiple edit-patterns.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Results: Content-Related</title>
    </sec>
    <sec id="sec-6">
      <title>Edit-Patterns</title>
      <p>• check for [missing/null] values for [some countries/regions
there is no province/state data this is probably correct
and not a flaw in] the [data] (p01-covid+5) 6</p>
      <sec id="sec-6-1">
        <title>The preceding quantitative analyses showed only a sin</title>
        <p>gle, stylistic diference in participants’ work with the
two notebooks. We therefore combine our qualitative 5.1.2. This-step
analyses of edit-patterns across the two notebooks. This-step edit-patterns occurred in many distinct
subcat</p>
        <p>We manually coded the edit-patterns in each partici- egories. The first subcategory is the addition of a few
pant’s text in each markdown cell, according to our code- words to clarify the current step:
book (Section 3.3.1). For the 22 participants in nine
markdown cells, we thus coded 99 texts in each notebook, for • ### importing libraries ### importing [the necessary]
a total of 198 coded markdown cells. We applied an infor- libraries (p20-covid+1)
mal version of thematic analysis [46], noting Braun’s and While P20-covid’s addition might be only a matter of
Clarke’s advice that there are multiple ways of conduct- emphasis, other simple additions provided much more
ing a thematic analysis [47]. Previous grounded theory specific information about what was being done in the
and thematic analysis studies have involved from 6 to step. P08-covid changed the meaning of the generated
74 participants [48, 49], and so our sample of 22 partic- documentation by adding specificity about what value
ipants is within that conventional range. Within this was being computed:
sample, we used the saturation practices of Guest et al.
[50] (recommended by Ando et al. [46]) and Majid et al. • ### check [number of] the missing values (p08-covid+5)
[51], defining saturation by a code that appeared from at
least two participants. We made no restrictions on the P03–house provided a diferent kind of specificity about
number of codes that we identified in a single text. Thus the types of datasets being used:
a text might have zero codes if the participant simply • ### read the [training and test datasets] (p03-house+2)
accepted the algorithmically-generated documentation,
or it might have as many as three or four diferent codes P13-covid engaged in the same type of edit-pattern (in
in complex cases. the other notebook), but included much greater detail:</p>
      </sec>
      <sec id="sec-6-2">
        <title>The preceding pair of examples suggests that partici</title>
        <p>• ### create the target and the test data [re-create train- pants may solve the same problem, in the same
notebooking] and test [datasets based on] the [size of] the [orig- cell, in diferent ways. We found many examples of
difinal training dataset] (p03-house+7) ferent strategies and/or diferent conceptions of what the
intended reader would need to know, such as this
conBy contrast, P210-covid focused on the treatment of miss- trast between P13-covid’s rather minimalist description,
ing values vs. P01-covid’s much more extensive description:
• ### check for [any] missing values [note] that [province/ • ### replace a specified phrase [(_)] with another
specistate have quite a few] missing (p021-covid+5) ifed phrase [( ) then transform] the [datatype to int]
(p13-covid+4)
and P01-covid provided an even-more-detailed account
of the same issue, with commentary on what they had
observed:</p>
      </sec>
      <sec id="sec-6-3">
        <title>6We note that P01-covid edited-out the markdown formatting</title>
        <p>command, "###". We will have more to say about this kind of
stylistic edit-pattern, below.</p>
        <sec id="sec-6-3-1">
          <title>5.1. Details Edit-Patterns (three subcategories)</title>
        </sec>
      </sec>
      <sec id="sec-6-4">
        <title>Participants expanded on the generated documentation</title>
        <p>by adding details. There were three subcategories of
details: Contextual information, information about
Thisstep (the current step), and information about Subsequent
steps.
5.1.1. Contextual Details</p>
      </sec>
      <sec id="sec-6-5">
        <title>Contextual details could take several forms. P03-house</title>
        <p>clarified how the prior steps had produced materials that
were used in the current step:
• ### read a comma-separated values (csv) file into
dataframe [of training] data [and test] data return the first
5 rows [of] the [training] data (p13-covid+2)
We contrast P08-covid, P03-house, and P13-covid, who
were adding information about what was being
calculated or input, vs. P07-house and P26-house, who
described how the operations were done:
• ### create [train] and test data [by splitting</p>
        <p>dataframe] (p07-house+7)
• ### create the target and the test data and [use slicing]
(p026-house+7)
• ### data [preparation in] the [training] data [set] re- 5.2. Explanation Edit-Patterns
place the [dashes] with [spaces for] the [date column
and] convert the data [type to] integer (P01-covid+4)</p>
      </sec>
      <sec id="sec-6-6">
        <title>Sometimes participants went beyond simple details, writ</title>
        <p>ing a more extended Explanation. P02-house and
P15</p>
        <p>In some cases, participants added specific algorithmic house provided brief examples, in which they added
opdetails that none of the generated texts had included. A erational explanations of how to perform the activity in
repeated example was to mention (and sometimes dis- the cell:
cuss) root mean square error and its importance:
• ### return the first 5 rows [(defvalue=5)] (p02-house+3)
• ### evaluate a score by cross-validation [uses rmse as
an evaluation metric] with [5-fold] cross-validation (p03- • ### [separate train] and test [subsets post feature
enhouse+9) gineering set] the target [as saleprice] (p15-house+7)
• ### evaluate a score by [5-fold] cross-validation [using</p>
        <p>rmse] (p15-house+9)
5.1.3. Next-Step</p>
      </sec>
      <sec id="sec-6-7">
        <title>We found a third edit-pattern that anticipated the next step (i.e., a subsequent cell) in the notebook. P08-covid briefly stated the use that the inputted data would be put to:</title>
        <p>• ### read [and sanity check] the data (p04-covid+2)
However, in other cases, the participant provided a much
richer description of the next steps, as in this recitation
by P05-covid:
• ### Model A random forest is a meta estimator that fits
a number of decision tree classifiers on various
subsamples of the dataset and uses averaging to improve
the predictive accuracy and control over-fitting the
[first line below initiates] a model [instance] and the
[second line] fits the model on the [training data]
(p05covid+8)
P02-covid and P14-house went further, documenting the
nature of source data files and their formats, and also the
functional significance of additional modules:
• read the [data: from] the [two files: ‘traincsv‘ and
‘testcsv‘ they contain] data [in csv format now ‘train‘
contains] the [train] data [and ‘test‘ contains] the [test]
data [start on] the [train] data first (p02-covid+2)
• ### importing libraries - pandas for [dataframes (like
excel spreadsheet)] - numpy for [fast vector operations
- sklearn] for [simple] data analysis [(in] this [case
linear model)] (p14-house+1)</p>
      </sec>
      <sec id="sec-6-8">
        <title>The Explanation edit-pattern brings in diferent types</title>
        <p>of information, including operational aspects and
extended explanatory material about data files and
programmatic resources. As we noted above, with more
data, we may discover that Explanations may need to
be combined with Details. Another possibility is that
Explanations may turn out to be a subset of Tutorial
edit-patterns, which we describe in the next subsection.</p>
        <sec id="sec-6-8-1">
          <title>5.3. Tutorial Edit-Patterns</title>
          <p>5.1.4. Details Patterns Summary
We have shown three edit-patterns in which participants
have provided more detail than was available in the
generated texts. These patterns might be considered as
spanning an imagined audience’s reading experience. In some • ### convert [training] data [remove dashes (‘-‘) in] the
cases, participants wrote Contextual information into [dates this is done by applying] the [‘replace‘ function
the generated texts. This contextual information was ‘astype‘ sets] the [‘date‘ column to integer type]
(p05generally retrospective - i.e., what should the reader have covid+4)
known in order to understand the code? In contrasting
cases, participants focused less on context, and more on Similarly, P03-house gave instructions about how to work
content within the current cell (This-step). Finally, in a with several datasets, including which columns (factors)
few cases, participants wrote to anticipate the next cell or were involved and how to process those columns:
cells. In the Discussion, we will think further about this • [## dataset preparation] the [next few cells prepare]
kind of participants’ mental model of their audience’s the [train and test datasets] ### concatenate the [train
experiences. and test datasets] with a [subset of columns
(mssub</p>
          <p>We also wish to acknowledge that some of our category class to salecondition)] is [format(a b)] (p03-house+4)
boundaries are fuzzy. The next category of edit-patterns
poses the question - what is the diference between a De- In a diferent cell, P05-covid explained the meaning of a
tails edit-pattern and an Explanation edit-pattern? With function call and gave further instruction about how to
further research, we may need to redraw this boundary. use the results of the function:</p>
        </sec>
      </sec>
      <sec id="sec-6-9">
        <title>In a more complex pattern, participants appeared to be</title>
        <p>teaching the reader how to do the analysis. For example,
P05-covid provide detailed explanations about how to
carry-out a series of operations in python:
• ### check the missing values detect the [number of]
missing values for [each column ‘isnull()‘ returns] an
[array of indicators of whether each value in a column
is] missing [and ‘sum()‘ calculates] the [total number
of] missing values [along] that [column] (p05-covid+5)
P11-house taught how a conceptual operation worked
and also gave advice about the naming of the statistical
action:
• [#####] this code cell is for [handling] missing
values [which are replaced with] the [mean value] for
[that feature] this is [also known as column-wise
meanimputation] (p11-house+6)
Finally, we note that P24-covid took a somewhat diferent
tutorial strategy. They left the original generated text
intact as provided by the algorithm, and then added a link to
more information about the python code-structures that
were used in the cell that the algorithm had described:
• p24-covid/expt ### replace a specified phrase with
another specified phrase [[for more information about
lambda](https://realpythoncom/python-lambda/)]
(p24covid+4)</p>
        <p>Tutorial edit-patterns went much further than
Explanation edit-patterns (which themselves had gone further
than Details edit-patterns). Tutorial edit-patterns provide
not only how-to information, but also interpretations of
the meaning or purpose of actions, and in one case a link
to further information.</p>
        <sec id="sec-6-9-1">
          <title>5.4. Rewriting Edit-Patterns</title>
          <p>In the preceding subsection, we began to describe a
dimension of increasing complexity in the information that
participants provided, and also (we infer) increasing
effort on the part of the informants to think "beyond" what
was given in the generated text. The last edit-pattern in
this series involves even more "beyond" work: beyond
previous transformations of the generated text, and
probably beyond previous levels of efort. We call this
editpattern "Rewriting," because it involves nearly complete
replacement of the generated text.</p>
          <p>It may be that the generated text in certain cells led to
more instances of Rewriting. For example, three diferent
participants rewrote the generated text of cell 4 of the
House notebook:</p>
        </sec>
      </sec>
      <sec id="sec-6-10">
        <title>Similarly, two people rewrote the contents of cell 9 of the</title>
        <p>Covid notebook:
• ### [make predictions] (p17-covid+9)
• ### [test] to [see how] the [model performs]
(p20covid+9)</p>
      </sec>
      <sec id="sec-6-11">
        <title>While it initially appears that Rewritten cells were relatively brief, we found that two other participants Rewrote the same cell (cell 9 of the Covid notebook) at much greater length:</title>
        <p>• ### [run] the [model] to [generate predictions] on the
[test data] and [store them as a ‘dataframe‘]
(p04covid+9)
• use the [trained model] to [predict] the [target] on the
[test data] (p05-covid+9)</p>
      </sec>
      <sec id="sec-6-12">
        <title>In some cases, partipants Rewrote the generated text at a higher level of sophsitication: • ### [one hot encode] the [features] (p15-house+5)</title>
      </sec>
      <sec id="sec-6-13">
        <title>And in one case, the participant Rewrote in a very summary fashion, only listing a series of steps: • [modeling process - subsetting data - cleaning data getting rid of nulls -model training] (p14-house+4)</title>
        <p>In the Rewriting edit-pattern, we see diverse strategies,
ranging from brief summaries to extensive new text, as
well as high-level abstractions. Significantly, in multiple
cases, participants made distinctly diferent Rewritings
of the same generated text (i.e., in the same cell of the
notebook). Thus, while the category of the edit-pattern is
the same, the individual strategies can be quite diferent.
We recall that we saw analogous patterns in the This-step
Details edit-patterns of subsesction 5.1.2. While there
may be agreement among participants that the generated
text in certain cells requires a certain type of change,
participants clearly adopt diferent strategies about how
to make those changes.</p>
        <sec id="sec-6-13-1">
          <title>5.5. Content Edit-Patterns Summary</title>
          <p>In this part of Results, we have examined how
participants changed the contents of the generated texts. Figure
2 summarizes a dimension that runs from simple Details,
to more complex Explanations, to instructive Tutorials,
• ### [join features from train and test into one df] (p15- and finally to complete Rewritings of the generated text.
house+4) Collectively, participants have an extensive repertoire of
edit-patterns that they apply to particular problems in
• ### [transform and clean] the [data] (p22-house+4) particular documentation cells. We next examine more
stylistic changes that participants applied to generated
• ### [concat train and test col salecondition] (p02- text.</p>
          <p>house+4)</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>6. Results: Stylistic Edit-Patterns</title>
      <sec id="sec-7-1">
        <title>6.1. Modifying document hierarchies</title>
        <p>• ### model [creating] (p21-covid+8)
• ### model [training] (p24-covid+8)
Sometimes in combination with other edit-patterns, par- However, participants also engaged in more complex
ticipants modified the markdown formatting from the ways of completing a sentence. For example, P01-covid
generated texts. Initially, all markdown texts were pro- both added a verb and changed the object of that verb:7
vided at the same hierarchical level (###). In multiple • [generate] the [predictions] (p01-covid+9)
cases, participants modified those levels, placing texts in
super-order / sub-order relation to one another:
• [#####] this code cell is for [handling] missing
values [which are replaced with] the [mean value] for
[that feature] this is [also known as column-wise mean- • ### [fit regression] model (p02-house+8)
imputation] (p11-house+6)</p>
        <sec id="sec-7-1-1">
          <title>While editing the same cell, P02-house, P07-house, and P15-house added the same verb, but then made diferent modifications to the object of that verb:</title>
        </sec>
        <sec id="sec-7-1-2">
          <title>P17-covid modified the header markdown specification</title>
          <p>when combining the contents of two diferent forms of
generated text:
• ### [create] a [classifier ####] random forest [classifier]
(p17-covid+8)</p>
        </sec>
        <sec id="sec-7-1-3">
          <title>Further research will be needed to understand if these stylistic/formatting edits are related to changes to the words in the documentation.</title>
        </sec>
      </sec>
      <sec id="sec-7-2">
        <title>6.2. Completing a Sentence</title>
        <p>Some of the changes to content appeared to clarify what
was being done in the code. The primary subcategory of
these changes was to add a verb to a noun-phrase:
• ### [fit] the model (p01-covid+8)
• ### [train the] model (p10-house+8)
• ### [fit] a lasso linear model [to the training data]
(p07-house+8)
• ### [train] lasso [cv] linear model (p15-house+8)
There were also even more complex cases, in which it
is not clear if the participant’s purpose was to complete
a sentence Here, we repeat one example of P17-covid
from the previous subsection, which illustrates our point
about the ambiguity of complex cases:
• ### [create] a [classifier ####] random forest [classifier]
(p17-covid+8)
• ### [leverage] the random forest model and [fit] the
model [with training] dataset [(a] random forest is
a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and
uses averaging to improve the predictive accuracy and
control [over-fitting)] (p13-covid+8)</p>
        <sec id="sec-7-2-1">
          <title>Some participants added the verb in a diferent position</title>
          <p>in the sentence. In these two examples, we see P21-covid 7We use ”verb” and ”object” in the technical senses of
Englishand P24-covid modifying the same generated text, but language grammar (e.g., [52]). A ”verb” performs an action. An
with diferent versbs: ”object” receives the efect of that action.</p>
        </sec>
      </sec>
      <sec id="sec-7-3">
        <title>6.3. Conversational Tone</title>
        <sec id="sec-7-3-1">
          <title>We observed a further stylistic modification which ap</title>
          <p>peared to make the generated text more conversational.
To avoid asserting our own judgments of what
”conversational” might mean, we show only examples in which the
participant added a personal pronoun - typically ”we” or
”you”. For ease in reading, we have bolded those pronouns
in the following examples:
• importing libraries: [in] this code [segment you import
the python] libraries [first that include ‘numpy‘] and
[‘panda‘ you also import] a [class from sklean if you
need to display some warning import the warnings]
library [as shown] (p21-covid+1)
• ### [define] and [configure] the model a random forest
is a meta estimator that fits a number of decision tree
classifiers on various sub-samples of the dataset and
uses averaging to improve the predictive accuracy and
control over-fitting [ we also train] the model [with
‘fit()‘] (p04-covid+8)</p>
        </sec>
      </sec>
      <sec id="sec-7-4">
        <title>6.4. Stylistic Edit-Patterns Summary</title>
        <p>We acknowledge that the distinction between content
and style is far from clear (e.g., [53]). Therefore, we
consider that our current categorization of Content-Related
edit-patterns and Stylistic edit-patterns may require
revision. With larger datasets, we may for example
conclude that Conversational Tone is more related to Tutorial
changes, and less related to ”style.” The same may occur
with Completing a Sentence. We will also need to
understand better the relationship of header-styles to content
in brief documentation text snippets.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>7. Results: Summary Statistics of</title>
    </sec>
    <sec id="sec-9">
      <title>Edit-Patterns</title>
      <sec id="sec-9-1">
        <title>We computed the percentage of the Content-edited cells</title>
        <p>in which each of the above Content edit-pattern appeared.</p>
        <p>Details edit-patterns were the most frequent. This may be
unsurprising, because these kinds of edit-patterns took
relatively little efort (Figure 2). In general, edit-patterns
that were most costly of efort (Tutorial, Rewritten) had
lower frequencies of occurrence, with Rewritten
editpatterns occurring in fewer than 15% of the edited cells.</p>
        <p>These results have implications for the design of
algorithmic documentation systems. Taken together with
our prior work on TransCoder, we can also see emergent
• [##### here we] evaluate the [square-root of] the [5- ideas about how people can understand and make use
fold cross-validated mean-squared-error of] the [trained] of AI outcomes, even in the absence of formal
explanamodel with the [training set ‘(x_train y)‘ (p11-house+9) tory systems (e.g., Explainable AI, or XAI). Finally, these
two projects point us toward important questions in the
• ### [we now show] the [predicted values] (p24-covid+9) design of future human-AI collaborative systems.
Note: A single cell could contain multiple edit-patterns.</p>
        <p>Therefore, sums of percentages may not be meaningful.</p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>8. Discussion</title>
      <sec id="sec-10-1">
        <title>8.1. Learning from Participants’</title>
      </sec>
      <sec id="sec-10-2">
        <title>Improvements</title>
        <sec id="sec-10-2-1">
          <title>The results we reported on participants’ editing patterns</title>
          <p>lead us to think about a few implications to further
improve the automatic documentation approach. First, the
Details edit-patterns, Explanation edit-patterns, and
Tutorial edit-patterns are relevant to the purpose of the
notebook and the target audience. We believe that a
future version of the generative approach should tailor
the automatic documentation based on the usage
scenario. Data scientists can benefit from more candidate
documentation where the level of details varied.</p>
          <p>With a larger dataset, we could associate these
editpatterns to particular patterns in the
algorithmicallygenerated texts. Based on those associations, we could
modify the algorithms to anticipate the kinds of edits
that humans have previously made (e.g., [54]). For
example, if we can remove the need for Details-related and
Conversational-Tone edits, then humans can focus on
higher-value editing, such as Tutorials. We may then see
emergent categories of even more task-specific and/or
domain-oriented edit-patterns, when humans no longer
need to put work into less significant edits.</p>
          <p>
            One way to do this, is to include a reinforcement
learning component into the algorithm that could learn from
users’ modifications to the proposed texts. Our current
GNN model relies on the size of the training dataset to en- If the human edits the target code in the TransCoder
sure the quality of the results. However, data science code project, then a secondary AI (e.g., [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]) might assist by
snippets are patternless and are of limited use for gener- generating completion of the human’s new code or
sugating explanations. In the future, we can combine deep gesting additional modifications to the target to remain
reinforcement learning with our current GNN model consistent with the changes. This secondary AI would be
[55] to improve the performance and generalizability of ”aware” of the original code, and could provide additional
the results. This can help provide consistent stylistic type-ahead support, advice, or consultation as needed.
documentation in terms of the writing style, sentence Similarly, if the human edits the generated text in the
structure, and level of details. Documentation project, then a secondary AI (e.g., [56])
could provide assistance with Details types of edits, but
8.2. Flawed Generative Outcomes can be could also provide language-quality (Stylistic) support
for more complex Tutorial accounts, or even narratives
          </p>
          <p>Useful Outcomes (e.g., [57]).</p>
          <p>
            In our earlier generative documentation paper [
            <xref ref-type="bibr" rid="ref8">8</xref>
            ], we
learned that people accepted algorithmically-produced 8.4. How Does a Generative AI Model
documentation in 45% of the cells. In this workshop Ask for Help?
paper’s analysis of the 41% of edited cells, participants
retained at least part of the generated text in over 85% of Our work highlights an opportunity for enriching human
the cells (Details, Explanations, Tutorials). The fact that interactions with generative models. At a base level, a
they chose to do the extra work to Rewrite in only 15% generative model takes input (e.g. code) and produces
of the cells, is evidence that they mostly chose to work output (e.g. documentation for that code). Agarwal et al.
with imperfect text rather than to replace it. [42] demonstrate how a generative model can produce
          </p>
          <p>
            This outcome is consistent with our previous study of confidence scores alongside its output, and Weisz et al.
code translation, in which engineers reported that they [41] show the utility such scores can have in steering
preferred an imperfect translation to no translation at all. human attention toward reviewing portions of the output
While we hope to improve our generative algorithms in in which the model has low confidence. In this work, we
both research programs, we also envision future studies demonstrate how having an understanding of the nature
in which we will calibrate the quality of the outcomes, to of human edit-patterns to a generative model’s output
determine the threshold of ”poor quality” below which an can enable a generative model to not only identify where
algorithm should not be deployed. We could then perhaps human attention is needed, but also how human efort
provide a more ”skeletal” outcome, such as an outline of can be used to improve the quality of its output.
documentation rather than full-text. We could also treat One of the interesting questions will be exactly how
”poor-quality” instances as higher-priority opportunities to choose among those alternatives - i.e., when is
typefor algorithmic improvement. ahead useful, and how should a watchful but respectful
AI intervene with advice, and what dialogic or other
com8.3. Deepening Human-AI Collaborations munication structures should be involved in an on-going
AI-human consultation [58, 59, 60])? Our experiences
We now consider each of our research programs (transla- with the NMT algorithm in the TransCoder experiment
tion and documentation) in terms of published patterns showed that, with a suficiently broad beam-search [ 61],
of human-AI collaboration [
            <xref ref-type="bibr" rid="ref1 ref2 ref4 ref5 ref6 ref7">6, 1, 2, 4, 7, 5</xref>
            ]. In both of we could generate a manageable set of alternative
transour studies, the human provides some initial information lations, which could be compared using an algorithm like
(source code in both cases), and the generative algorithm [62] to determine regions of agreement between the
transresponds with a proposed outcome text (target code or lations as well as regions of uncertainty along with the
documentation, respectively). After that, with minimalist alternatives considered for the uncertain regions. These
support, the human has to make their own way - e.g., by alternatives could then serve as informal explanations
choosing among alternative translations for sections of - e.g., ”Q: Why is the output marked as uncertain in this
the target code, or by manually editing the documenta- region? A: Because the algorithm considered multiple
postion. sible translations at this point, and this is what they were.”
          </p>
          <p>
            These patterns remain consistent with simple initiative The GNN in the Documentation project might also be
models (e.g., [
            <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
            ]), and fall short of the richer on-going modified to produce multiple possible texts, with
siminteractions of some of the experimental MICI applica- ilar explanatory power (”Q: Why is this documentation
tions [
            <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
            ], with their potential of AI-augmentation in marked as uncertain...” ).
support of skilled human work. We anticipate that fu- If there are multiple, alternative outcomes with no
ture versions of both projects could move toward longer emergent ”most-probable” alternative, then this could
and richer exchanges between human and AI (e.g., [
            <xref ref-type="bibr" rid="ref6">6</xref>
            ]). become an initiation point at which the algorithm
detects the need for human assistance. One way to
implement this request-for-help is as a ”human-in-the-loop”
paradigm, in which the human responds to the needs
of the algorithm. We also envision situations in which
the human may be in the midst of editing code or text,
and may ask the algorithm to serve as a text-assistant
for the human’s on-going editing work. This could
become an example of the ”AI in the loop” paradigm that
we discussed at last year’s workshop. In these ways, we
could move from the relatively single-process ”initiative”
models of [
            <xref ref-type="bibr" rid="ref2 ref4 ref7">2, 4, 7</xref>
            ], and toward a more collaborative and
on-going series of interactions as in [
            <xref ref-type="bibr" rid="ref1 ref5">1, 5</xref>
            ].
          </p>
        </sec>
      </sec>
      <sec id="sec-10-3">
        <title>8.5. Limitations</title>
        <sec id="sec-10-3-1">
          <title>In order to conduct a controlled study, we sacrificed eco</title>
          <p>logical validity. We asked participants to document
someone else’s notebook. By contrast, the canonical case in
Jupyter notebooks is to document one’s own code. A
future goal should be a more naturalistic practice of
documenting my notebook.</p>
          <p>Paradoxically, for precision of evaluation, we may also
need to perform an even more controlled study, in which
each person receives only one algorithm’s text at a time.
This approach could help us to assess each algorithmic
approach more independently than in the preliminary
experiment in this workshop paper.</p>
          <p>We also note that our analytic method could be
strengthened in future research. Our bag-of-words approach was
insensitive to word-order, and we looked only at
patterns of added words. Future work should also examine
patterns of deleted words.</p>
          <p>Finally, we note the obvious sampling weaknesses. We
conducted a relatively small study in a single institution.
We hope to examine similar practices in other settings,
and with more participants.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-11">
      <title>9. Conclusion</title>
      <p>In this workshop paper, we have addressed topics in
human-AI collaboration in data science and software
engineering. We reported text analytic results from a study
of generative documentation, showing that participants
accepted generated text with or without modification in
the majority of instances. These results are consistent
with our earlier work, in which engineers were
enthusiastic about using imperfect NMT-generated translations
of software code. Similarly, participants in this study
were also quite ready to accept or to work with imperfect
GNN-generated texts. We also analyzed the edit-patterns
in the generated text, developing categories that suggest
future work directions. Finally, going beyond early
unidirectional models of ”initiative,” we sketched promising
directions for longer-term, on-going human-AI
collaborations.
[35] M. White, M. Tufano, C. Vendome, D. Poshyvanyk, man Factors in Computing Systems, 2019, pp. 1–13.</p>
      <p>Deep learning code fragments for code clone detec- [49] A. Pradhan, B. Jelen, K. A. Siek, J. Chan, A. Lazar,
tion, in: 2016 31st IEEE/ACM International Con- Understanding older adults’ participation in design
ference on Automated Software Engineering (ASE), workshops, in: Proceedings of the 2020 CHI
ConIEEE, 2016, pp. 87–98. ference on Human Factors in Computing Systems,
[36] M. Allamanis, E. T. Barr, P. Devanbu, C. Sutton, A 2020, pp. 1–15.</p>
      <p>survey of machine learning for big code and natu- [50] G. Guest, A. Bunce, L. Johnson, How many
interralness, ACM Computing Surveys (CSUR) 51 (2018) views are enough? an experiment with data
sat1–37. uration and variability, Field methods 18 (2006)
[37] A. Radford, J. Wu, R. Child, D. Luan, D. Amodei, 59–82.</p>
      <p>I. Sutskever, Language models are unsupervised [51] M. A. A. Majid, M. Othman, S. F. Mohamad, S. A. H.
multitask learners, OpenAI blog 1 (2019) 9. Lim, Achieving data saturation: evidence from a
[38] T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Ka- qualitative study of job satisfaction, Social and
plan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sas- Management Research Journal 15 (2018) 66–77.
try, A. Askell, et al., Language models are few-shot [52] B. S. Azar, D. A. Azar, R. S. Koch, Understanding
learners, arXiv preprint arXiv:2005.14165 (2020). and Using English Grammar: Workbook, Longman,
[39] M. Brockschmidt, M. Allamanis, A. L. Gaunt, 2000.</p>
      <p>O. Polozov, Generative code modeling with graphs, [53] G. Lakof, A figure of thought, Metaphor and
arXiv preprint arXiv:1805.08490 (2018). symbol 1 (1986) 215–225.
[40] M. Tufano, C. Watson, G. Bavota, M. D. Penta, [54] K.-H. Zeng, M. Shoeybi, M.-Y. Liu, Style
exampleM. White, D. Poshyvanyk, An empirical study on guided text generation using generative
adversarlearning bug-fixing patches in the wild via neural ial transformers, arXiv preprint arXiv:2003.00674
machine translation, ACM Transactions on Soft- (2020).
ware Engineering and Methodology (TOSEM) 28 [55] P. Almasan, J. Suárez-Varela, A. Badia-Sampera,
(2019) 1–29. K. Rusek, P. Barlet-Ros, A. Cabellos-Aparicio, Deep
[41] J. D. Weisz, M. Muller, S. Houde, J. Richards, S. L. reinforcement learning meets graph neural
netRoss, F. Martinez, M. Agarwal, K. Talamadupula, works: exploring a routing optimization use case,
Perfection not required? human-ai partnerships in arXiv (2019) arXiv–1910.</p>
      <p>code translation, in: Proceedings of IUI 2021, 2021. [56] C. Rafel, N. Shazeer, A. Roberts, K. Lee, S. Narang,
[42] M. Agarwal, K. Talamadupula, S. Houde, F. Mar- M. Matena, Y. Zhou, W. Li, P. J. Liu, Exploring the
tinez, M. Muller, J. Richards, S. Ross, J. D. Weisz, limits of transfer learning with a unified
text-toQuality estimation &amp; interpretability for code trans- text transformer, arXiv preprint arXiv:1910.10683
lation, arXiv preprint arXiv:2012.07581 (2020). (2019).
[43] M. B. Kery, M. Radensky, M. Arya, B. E. John, B. A. [57] R. Kazman, G. Abowd, L. Bass, P. Clements,
Myers, The story in the notebook: Exploratory Scenario-based analysis of software architecture,
data science using a literate programming tool, in: IEEE software 13 (1996) 47–55.</p>
      <p>Proceedings of the 2018 CHI Conference on Human [58] E. Horvitz, Uncertainty, action, and interaction: In
Factors in Computing Systems, 2018, pp. 1–11. pursuit of mixed-initiative computing, Intelligent
[44] J. M. Perkel, Why jupyter is data scientists’ com- Systems (1999) 17–20.</p>
      <p>putational notebook of choice., Nature 563 (2018) [59] S. Ross, E. Brownholtz, R. Armes, Voice user
in145–147. terface principles for a conversational agent, in:
[45] A. LeClair, S. Haque, L. Wu, C. McMillan, Improved Proceedings of the 9th International Conference on
code summarization via a graph neural network, Intelligent User Interfaces, 2004, pp. 364–365.
arXiv preprint arXiv:2004.02843 (2020). [60] S. Ross, E. Brownholtz, R. Armes, A
multiple[46] H. Ando, R. Cousins, C. Young, Achieving satura- application conversational agent, in: Proceedings
tion in thematic analysis: Development and refine- of the 9th International Conference on Intelligent
ment of a codebook, Comprehensive Psychology 3 User Interfaces, 2004, pp. 319–321.
(2014) 03–CP. [61] C. Wilt, J. Thayer, W. Ruml, A comparison of greedy
[47] V. Braun, V. Clarke, Using thematic analysis in search algorithms (2010).</p>
      <p>psychology, Qualitative research in psychology 3 [62] E. Myers, An o(nd) diference algorithm and its
(2006) 77–101. variations, Algorithmica 1 (1986) 251–266.
[48] C. M. Baker, L. R. Milne, R. E. Ladner,
Understanding the impact of tvis on technology use and
selection by children with visual impairments, in:
Proceedings of the 2019 CHI Conference on
Hu</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Deterding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hook</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Fiebrink</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gillies</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gow</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Akten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Smith</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Liapis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Compton</surname>
          </string-name>
          ,
          <article-title>Mixedinitiative creative interfaces</article-title>
          ,
          <source>in: Proceedings of the 2017 CHI Conference Extended Abstracts on Human Factors in Computing Systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>628</fpage>
          -
          <lpage>635</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Horvitz</surname>
          </string-name>
          ,
          <article-title>Principles of mixed-initiative user interfaces</article-title>
          ,
          <source>in: Proceedings of the SIGCHI conference on Human Factors in Computing Systems</source>
          ,
          <year>1999</year>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>166</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weisz</surname>
          </string-name>
          , W. Geyer,
          <article-title>Mixed initiative generative ai interfaces: An analytic framework for generativeai applications (</article-title>
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Parasuraman</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Sheridan</surname>
            ,
            <given-names>C. D.</given-names>
          </string-name>
          <string-name>
            <surname>Wickens</surname>
          </string-name>
          ,
          <article-title>A model for types and levels of human interaction with automation</article-title>
          ,
          <source>IEEE Transactions on systems, man, and cybernetics-Part A: Systems and Humans</source>
          <volume>30</volume>
          (
          <year>2000</year>
          )
          <fpage>286</fpage>
          -
          <lpage>297</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Spoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Oyelnik</surname>
          </string-name>
          ,
          <article-title>Library of mixed initiative creative interfaces, "</article-title>
          http://mici.codingconduct.
          <source>cc"</source>
          ,
          <year>2017</year>
          . [Online; accessed 21-December-2020].
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Biles</surname>
          </string-name>
          , Genjam:
          <article-title>Evolution of a jazz improviser, in: Creative evolutionary systems</article-title>
          , Elsevier,
          <year>2002</year>
          , pp.
          <fpage>165</fpage>
          -
          <lpage>187</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>B.</given-names>
            <surname>Shneiderman</surname>
          </string-name>
          ,
          <article-title>Human-centered artificial intelligence: Reliable, safe</article-title>
          &amp; trustworthy,
          <source>International Journal of Human-Computer Interaction</source>
          <volume>36</volume>
          (
          <year>2020</year>
          )
          <fpage>495</fpage>
          -
          <lpage>504</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Drozdal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Muller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Weisz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dugan</surname>
          </string-name>
          , Themisto:
          <article-title>Towards automated documentation generation in computational notebooks</article-title>
          ,
          <source>arXiv preprint arXiv:2102.12592</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>A.</given-names>
            <surname>Hindle</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Su</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gabel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Devanbu</surname>
          </string-name>
          ,
          <article-title>On the naturalness of software</article-title>
          ,
          <source>in: 2012 34th International Conference on Software Engineering (ICSE)</source>
          , IEEE,
          <year>2012</year>
          , pp.
          <fpage>837</fpage>
          -
          <lpage>847</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>V.</given-names>
            <surname>Raychev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vechev</surname>
          </string-name>
          , E. Yahav,
          <article-title>Code completion with statistical language models</article-title>
          ,
          <source>in: Proceedings of the 35th ACM SIGPLAN Conference on Programming Language Design and Implementation</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>419</fpage>
          -
          <lpage>428</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bruch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Monperrus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Mezini</surname>
          </string-name>
          ,
          <article-title>Learning from examples to improve code completion systems</article-title>
          ,
          <source>in: Proceedings of the 7th joint meeting of the European software engineering conference and the ACM SIGSOFT symposium on the foundations of software engineering</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>213</fpage>
          -
          <lpage>222</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Svyatkovskiy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Sundaresan</surname>
          </string-name>
          , Intellicode compose:
          <article-title>Code generation using trans-</article-title>
          [23]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pradel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sen</surname>
          </string-name>
          ,
          <article-title>Deepbugs: A learning approach former</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>08025</volume>
          (
          <year>2020</year>
          ).
          <article-title>to name-based bug detection</article-title>
          ,
          <source>Proceedings of the</source>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>B.</given-names>
            <surname>Roziere</surname>
          </string-name>
          , M.
          <article-title>-</article-title>
          <string-name>
            <surname>A. Lachaux</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Chanussot</surname>
          </string-name>
          , G. Lam- ACM
          <source>on Programming Languages</source>
          <volume>2</volume>
          (
          <year>2018</year>
          )
          <fpage>1</fpage>
          -
          <lpage>25</lpage>
          . ple, Unsupervised translation of programming lan- [24]
          <string-name>
            <given-names>M.</given-names>
            <surname>Vasic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kanade</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maniatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bieber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          , guages,
          <source>Advances in Neural Information Processing Neural program repair by jointly learning to localSystems 33</source>
          (
          <year>2020</year>
          ).
          <article-title>ize and repair</article-title>
          , arXiv preprint arXiv:
          <year>1904</year>
          .01720
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>L.</given-names>
            <surname>Mou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Z. Jin,</surname>
          </string-name>
          (
          <year>2019</year>
          ).
          <article-title>Convolutional neural networks over tree struc-</article-title>
          [25]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dinella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>Wang, tures for programming language processing</article-title>
          , in: Hoppity: Learning graph transformations to detect D.
          <string-name>
            <surname>Schuurmans</surname>
            ,
            <given-names>M. P.</given-names>
          </string-name>
          Wellman (Eds.),
          <article-title>Proceed- and fix bugs in programs</article-title>
          ,
          <source>in: International Conferings of the Thirtieth AAAI Conference on Arti- ence on Learning Representations</source>
          ,
          <year>2019</year>
          . ifcial Intelligence,
          <source>February 12-17</source>
          ,
          <year>2016</year>
          , Phoenix, [26]
          <string-name>
            <given-names>M.</given-names>
            <surname>White</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Tufano</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Martinez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Monperrus</surname>
          </string-name>
          , Arizona, USA, AAAI Press,
          <year>2016</year>
          , pp.
          <fpage>1287</fpage>
          -
          <lpage>D</lpage>
          . Poshyvanyk,
          <source>Sorting and transforming program 1293</source>
          . URL: http://www.aaai.org/ocs/index.php/ repair ingredients via deep learning code similariAAAI/AAAI16/paper/view/11775. ties, in: 2019 IEEE 26th International Conference
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V.</given-names>
            <surname>Jayasundara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. D. Q.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Jiang</surname>
          </string-name>
          , D. Lo,
          <article-title>on Software Analysis, Evolution and Reengineering Treecaps: Tree-structured capsule networks for (SANER)</article-title>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>479</fpage>
          -
          <lpage>490</lpage>
          . program source code processing, arXiv preprint [27]
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Hellendoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Maniatis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          , arXiv:
          <year>1910</year>
          .
          <volume>12306</volume>
          (
          <year>2019</year>
          ). D. Bieber, Global Relational Models of Source Code,
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lillicrap</surname>
          </string-name>
          , I. Sutskever,
          <string-name>
            <given-names>S.</given-names>
            <surname>Levine</surname>
          </string-name>
          , Continu- in: International Conference on
          <article-title>Learning Represenous deep q-learning with model-based acceleration</article-title>
          , tations,
          <year>2020</year>
          . in: International Conference on Machine Learning, [28]
          <string-name>
            <given-names>L.</given-names>
            <surname>Moreno</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Aponte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Sridhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <surname>L.</surname>
          </string-name>
          <year>Pol2016</year>
          , pp.
          <fpage>2829</fpage>
          -
          <lpage>2838</lpage>
          . lock,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vijay-Shanker</surname>
          </string-name>
          ,
          <article-title>Automatic generation of</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>N. D. Q.</given-names>
            <surname>Bui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Jiang,</surname>
          </string-name>
          <article-title>SAR: learning natural language summaries for java classes, in: cross-language API mappings with little knowl- 2013 21st International Conference on Program edge</article-title>
          , in: M.
          <string-name>
            <surname>Dumas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Pfahl</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Apel</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Russo</surname>
          </string-name>
          <article-title>Comprehension (ICPC)</article-title>
          , IEEE,
          <year>2013</year>
          , pp.
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
          . (Eds.),
          <source>Proceedings of the ACM Joint Meeting</source>
          <volume>on</volume>
          [29]
          <string-name>
            <given-names>S.</given-names>
            <surname>Iyer</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Konstas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Cheung</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zettlemoyer</surname>
          </string-name>
          , European Software Engineering Conference and
          <article-title>Summarizing source code using a neural attention Symposium on the Foundations of Software En- model, in: Proceedings of the 54th Annual Meeting gineering</article-title>
          ,
          <source>ESEC/SIGSOFT FSE</source>
          <year>2019</year>
          ,
          <article-title>Tallinn, Es- of the Association for Computational Linguistics tonia</article-title>
          ,
          <source>August 26-30</source>
          ,
          <year>2019</year>
          , ACM,
          <year>2019</year>
          , pp.
          <fpage>796</fpage>
          - (Volume
          <volume>1</volume>
          :
          <string-name>
            <surname>Long</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <year>2016</year>
          , pp.
          <fpage>2073</fpage>
          -
          <lpage>2083</lpage>
          . 806. URL: https://doi.org/10.1145/3338906.3338924. [30]
          <string-name>
            <given-names>S.</given-names>
            <surname>Scalabrino</surname>
          </string-name>
          , G. Bavota,
          <string-name>
            <given-names>C.</given-names>
            <surname>Vendome</surname>
          </string-name>
          , M. Linaresdoi:
          <volume>10</volume>
          .1145/3338906.3338924.
          <string-name>
            <surname>Vásquez</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Poshyvanyk</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Oliveto</surname>
          </string-name>
          , Automatically
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Allamanis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sutton</surname>
          </string-name>
          ,
          <article-title>A convolutional assessing code understandability: How far are we?, attention network for extreme summarization of in: 2017 32nd IEEE</article-title>
          /ACM International Conference source code, in: International conference on ma- on
          <source>Automated Software Engineering (ASE)</source>
          , IEEE, chine learning,
          <year>2016</year>
          , pp.
          <fpage>2091</fpage>
          -
          <lpage>2100</lpage>
          .
          <year>2017</year>
          , pp.
          <fpage>417</fpage>
          -
          <lpage>427</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>U.</given-names>
            <surname>Alon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zilberstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          , E. Yahav, code2vec: [31]
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Xia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Jin</surname>
          </string-name>
          ,
          <article-title>Deep code comLearning distributed representations of code, Pro- ment generation</article-title>
          ,
          <source>in: 2018 IEEE/ACM 26th Interceedings of the ACM on Programming Languages national Conference on Program Comprehension</source>
          <volume>3</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>29</lpage>
          . (ICPC), IEEE,
          <year>2018</year>
          , pp.
          <fpage>200</fpage>
          -
          <lpage>20010</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Hellendoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bird</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Barr</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Allamanis</surname>
          </string-name>
          , [32]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Yang</surname>
          </string-name>
          , G. Xu,
          <string-name>
            <given-names>H.</given-names>
            <surname>Ying</surname>
          </string-name>
          , J. Wu,
          <article-title>Deep learning type inference</article-title>
          ,
          <source>in: Proceedings of P. S. Yu, Improving automatic source code sumthe</source>
          <year>2018</year>
          <article-title>26th acm joint meeting on european soft- marization via deep reinforcement learning</article-title>
          ,
          <source>in: ware engineering conference and symposium on Proceedings of the 33rd ACM/IEEE International the foundations of software engineering</source>
          ,
          <year>2018</year>
          , pp.
          <source>Conference on Automated Software Engineering</source>
          ,
          <fpage>152</fpage>
          -
          <lpage>162</lpage>
          .
          <year>2018</year>
          , pp.
          <fpage>397</fpage>
          -
          <lpage>407</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Goyal</surname>
          </string-name>
          , G. Durrett,
          <string-name>
            <surname>I. Dillig</surname>
          </string-name>
          , Lambdanet: [33]
          <string-name>
            <given-names>U.</given-names>
            <surname>Alon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Brody</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          , E. Yahav, code2seq:
          <article-title>GenProbabilistic type inference using graph neural net- erating sequences from structured representations works</article-title>
          , arXiv preprint arXiv:
          <year>2005</year>
          .
          <volume>02161</volume>
          (
          <year>2020</year>
          ). of code, arXiv preprint arXiv:
          <year>1808</year>
          .
          <volume>01400</volume>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>B.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Hellendoorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Godhane</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Tu</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Bac- [34]
          <string-name>
            <given-names>L.</given-names>
            <surname>Moreno</surname>
          </string-name>
          , G. Bavota,
          <string-name>
            <given-names>M. Di</given-names>
            <surname>Penta</surname>
          </string-name>
          , R. Oliveto, chelli, P. Devanbu,
          <article-title>On the "naturalness" of buggy A</article-title>
          .
          <string-name>
            <surname>Marcus</surname>
          </string-name>
          , G. Canfora,
          <article-title>Automatic generation of code</article-title>
          , in: 2016 IEEE/ACM 38th International Con- release notes,
          <source>in: Proceedings of the 22nd ACM ference on Software Engineering (ICSE)</source>
          , IEEE,
          <year>2016</year>
          , SIGSOFT International Symposium on Foundations pp.
          <fpage>428</fpage>
          -
          <lpage>439</lpage>
          . of Software Engineering,
          <year>2014</year>
          , pp.
          <fpage>484</fpage>
          -
          <lpage>495</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>