<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GiottoBugFixer: an efective and scalable easy-to-use framework for fixing software issues in a DevOps pipeline</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Placido Pellegriti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carmine Cisca</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Previtali</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>AlmavivA S.p.A.</institution>
          ,
          <addr-line>Via di Casal Boccone 188/190, Rome, 00137</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Developing software is one of the most important and crucial activity in the IT domain. It is an important, challenging and time consuming activity due to many factors that spaces from software complexity up to testing and deployment phases. In the past decades, a plethora of tools have been released for helping developers in coding faster, however they are now becoming inefective and unable to keep up with the change afecting the IT development. This paper investigates the potential of generative AI in the realm of software development, focusing on how these technologies can augment the coding process, from initial concept to final deployment. It begins by delineating the fundamental mechanisms through which generative AI models, such as code completions and automated code generation can enhance developer productivity, reduce error rates and streamline the software development lifecycle. We conducted an experimentation on several repositories obtaining around 25% of software issues automatically fixed with a 17x speed up.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Platform Engineering</kwd>
        <kwd>Software Automation</kwd>
        <kwd>Generative AI</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>2. Related Work</title>
      <p>
        Developing an automatic code fixer is key for enhancing
programming productivity [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and is an active area of
research [
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ].
      </p>
      <p>
        This trend has gained increasing popularity in recent
years. Examples include Google’s Tricorder [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ],
Facebook’s Getafix [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Zoncolan and Microsoft’s Visual
Studio IntelliCode. The techniques underlying these tools
can be classified into broadly two categories: logical,
rulebased techniques [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and statistical, data-driven
techniques [
        <xref ref-type="bibr" rid="ref6">7, 6, 8</xref>
        ]. The former uses manually written rules
capturing undesirable code patterns and scans the entire
codebase for these classes of bugs. The latter learns to
detect abnormal code from a large code corpus using
deep neural networks.
      </p>
      <p>
        Despite great strides, however, both kinds of tools are
limited in generality because they target error patterns in
specific codebases or they target specific bug types. For
instance, Zoncolan’s rules are designed to be specifically
applicable to Facebook’s codebases, and deep learning
models target specialized bugs in variable naming [7]
or binary expressions [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Moreover, the patterns are
relatively syntactic, allowing them to be specified by
human experts using logical rulesor learnt from a corpus
of programs.
      </p>
      <p>In this paper, we propose an efective and scalable
easyto-use framework for fixing software issues in a DevOps
pipeline by means of an LLM model (i.e., GPT3.51).</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>In the rapidly evolving field of software engineering,
understanding the intricacies of the software development
process is crucial for delivering high-quality, eficient and
reliable software solutions. This paper delves into the
comprehensive study of the software development
lifecycle, focusing on pivotal aspects such as code quality,
implementation and testing. By dissecting these elements,
we aim to ofer insights into optimizing the development
process, ensuring that software not only meets but
exceeds the rigorous demands of applications to be realized.</p>
      <p>At the heart of any software project lies the quality
of its code, which serves as the cornerstone for
functionality, maintainability, and scalability. We explore
methodologies and practices such as code reviews, static
code analysis, and adherence to coding standards that
contribute to enhancing code quality. By integrating
these practices, developers can reduce bugs, facilitate
easier updates, and ensure a robust foundation for the
software’s architecture. The phases of implementation
and testing are critical for transforming conceptual
designs into functioning software. Contributions. This
paper examines how generative AI models have been
integrated in a DevOps pipeline for helping in improving
the quality of the software released. We conducted an
experimentation on several repositories in Java and C# and
we demonstrated that our solution is able to fix around</p>
      <p>remove this useless assignment to local variable x
remove this unused x local variable
either log or rethrow this exception
throw a dedicated exception instead of a generic one
rename this field x to match the regular expression y
block of commented lines of code should be removed
merge this if statement with the enclosing one
add a x field to this class
cyclomatic complexity of this method x is greater than the authorized value
remove this expression which always evaluates to x
Auto Fix</p>
      <sec id="sec-2-1">
        <title>Best practices</title>
      </sec>
      <sec id="sec-2-2">
        <title>Unused fields</title>
      </sec>
      <sec id="sec-2-3">
        <title>Exception handling</title>
      </sec>
      <sec id="sec-2-4">
        <title>Code structure</title>
      </sec>
      <sec id="sec-2-5">
        <title>Code complexity</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Modelling Approach</title>
      <p>In this section, we describe the LLM models that have
been used, how the prompt has been engineered so that
it efectively performs for our task as well as the
classification of the issues based on a taxonomy that we defined.</p>
      <sec id="sec-3-1">
        <title>3.1. Model Selection</title>
        <p>We evaluated the following models:
1. (OpenAI) gpt-3.5-turbo-0613
2. (OpenAI) gpt-3.5-turbo-1106
3. (OpenAI) gpt-4-0613
4. (MetaAI) llama-2-7b-hf
We used OpenAI models via API on Cloud while we
finetuned the Llama 2 model. Fine-tuning has been carried
out by giving examples of snippets pairs incorrect
code/correct code extracted from our internal
repositories.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Framework</title>
        <p>We conducted an analysis about the distribution of issues
being fixed by the proposed approach among five classes
that we defined (see Figure 1). On unused fields,
best practices and code structure classes the
proposed solution is able to correct around 50% of the
issues whilst on the remaining two classes the fixing
rate is around 30%.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Prompt</title>
        <p>Engineered Prompt
System You are ChatGPT, a code snippet fixer. Your task
is to generate a fix for the provided code snippet
based on the given error message. Do not alter
the code snippet other than fixing the error.
Incomplete code should remain incomplete.
Submit your response in JSON format with the
keys: corrected_code, correction_flag,
explanation, renamed_variables.
corrected_code should be contained
in double quotes, and all double quotes in
the code snippet should be escaped with a
backslash. correction_flag should be
1 if you have corrected the code snippet,
0 otherwise. The explanation field should
contain a brief explanation of the correction.
renamed_variables should be a Python
dictionary containing the names of custom
user defined functions or variables that you
have renamed as keys, and their new names as
values. Do not add any builtin functions you
might have changed to renamed_variables.
User I have encountered an error.</p>
        <p>Error message: "System.Exception"
should not be thrown by user
Code snippet:
if (archiveResult.Result &lt;= 0) {
await sess.AbortTrans();</p>
        <p>throw new Exception("Fail"); }
Please fix the error in the code snippet without
completing it. The code must remain
incomplete and indented as in the original snippet.
Please provide a JSON response.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Post-Processing</title>
        <p>Following an analysis of common issues observed in
code returned by generative models, a series of
postprocessing functions have been implemented to enhance
the quality of the response both in terms of writing style
and integration with actual code. This manipulation
occurs before the code is inserted into files, prior to
undergoing quality checks and software compilation.</p>
        <p>Autocompletion errors prevention: Generative Table 1
models often tend to complete the input code, which fre- Results on Java repositories.
quently consist of incomplete fragments, such as if or for
statements without subsequent blocks, or portions that
lack logical coherence when considered out of context. Unused variables/fields: in this class there are five
To address this issue, lines generated as completions of SonarQube rules. In order to better understand what
these snippets can be removed, considering the error oc- kind of issues belong to this class, here two examples: 1)
curs midway through the original snippet. Using Greedy remove this useless assignment to local variable x and 2)
String Tiling, a metric employed in literature for compar- remove this unused x local variable.
ing code strings, the last lines of the generated code are Exception handling: in this class there are six
Sonarcompared with those from the input. If a match with the Qube rules. The type of issues belonging to this class are
original code’s nfial line is identified, only the preceding for example: 1) either log or rethrow this exception and 2)
part up to that line is retained for insertion into the file. throw a dedicated exception instead of a generic one.</p>
        <p>Indentation correction: The generated code often Best practices/conventions: in this class there are
loses the information regarding indentation levels, result- twenty-seven SonarQube rules. The type of issues
being in snippets where the indentation style and depth longing to this class are for example: 1) rename this field
may difer from the original code. This discrepancy can x to match the regular expression y and 2) block of
cominclude variations in both indentation style (such as tabs mented lines of code should be removed.
versus spaces) and indentation depth within the snippet. Code structure/elements: in this class there are</p>
        <p>Despite the flexible rules regarding indentation in cur- thirty-five SonarQube rules. The type of issues belonging
rently supported languages, a method has been imple- to this class are for example: 1) merge this if statement
mented to address this issue. This approach, again based with the enclosing one and 2) add a x field to this class .
on Greedy String Tiling, compares lines between input Code complexity: in this class there are ten
Sonarand output code to identify and apply a base indenta- Qube rules. The type of issues belonging to this class are
tion level that aligns with the indentation found in the for example: 1) the cyclomatic complexity of this method
received snippet. This ensures improved readability and x is greater than the authorized value and 2) remove this
quality of the generated code snippet, which is guaran- expression which always evaluates to x.
teed to have consistent indentation with the surrounding
code.</p>
      </sec>
      <sec id="sec-3-5">
        <title>4.2. Performance Results</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Experimental Evaluation</title>
      <p>In this section, we report a study on how issues are
distributed and results on two languages that are the most
widely used by developers.</p>
      <sec id="sec-4-1">
        <title>4.1. Issue Distribution</title>
        <p>We conducted an analysis about the distribution of issues
being fixed by the proposed approach among five classes
that we defined (see Figure 2).</p>
        <p>Looking at the plot, on three classes the proposed solution
is able to correct around 50% of the issues whilst on the
remaining two classes the fixing rate is around 30%.
We report the quantitative evaluation of the proposed
solution on the two languages of the experimentation.
In Table 1, we summarize the results on Java language
on which an average debit reduction of 29,4% has been
reached, with a peak of 63.0%. The pipeline executes on
average 5.5 times faster than developers with a peak of
17 times. In Table 2, we summarize the results on C#
language on which an average debit reduction of 25,9%
has been obtained, with a peak of 42.9%. The pipeline
executes on average 2.4 times faster than developers with
a peak of 4.8 times. Results on C# are slightly worst
because code is more complex and for building and
analyzing the code more time is required with respect to
Java.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions</title>
      <p>In conclusion, our comprehensive study elucidates the
multifaceted nature of the software development
process, ofering insights into optimizing development
practices to meet and exceed the demanding requirements
of today’s applications. The integration of generative AI
models into the software development lifecycle marks a
significant advancement, showcasing the potential to
revolutionize how software is developed, tested, and
maintained. This paper contributes to the body of knowledge
by demonstrating the efectiveness of these models in
improving software quality and development eficiency,
setting a precedent for future research and application
in the field of software engineering.
[7] M. Allamanis, M. Brockschmidt, M. Khademi, Learn- [8] M. Vasic, A. Kanade, P. Maniatis, D. Bieber, R. Singh,
ing to represent programs with graphs, arXiv Neural program repair by jointly learning to localize
preprint arXiv:1711.00740 (2017). and repair, arXiv preprint arXiv:1904.01720 (2019).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Seo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Sadowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Elbaum</surname>
          </string-name>
          , E. Aftandilian,
          <string-name>
            <given-names>R.</given-names>
            <surname>Bowdidge</surname>
          </string-name>
          ,
          <article-title>Programmers' build errors: a case study (at google)</article-title>
          ,
          <source>in: 36th International Conference on Software Engineering</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>724</fpage>
          -
          <lpage>734</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>E.</given-names>
            <surname>Dinella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Dai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Naik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>Hoppity: learning graph transformations to detect and fix bugs in programs</article-title>
          ,
          <source>in: International Conference on Learning Representations (ICLR)</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ding</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ray</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Devanbu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. J.</given-names>
            <surname>Hellendoorn</surname>
          </string-name>
          ,
          <article-title>Patching as translation: the data and the metaphor</article-title>
          ,
          <source>in: 35th ACM International Conference on Automated Software Engineering</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>286</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Mesbah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rice</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Johnston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Glorioso</surname>
          </string-name>
          , E. Aftandilian,
          <article-title>Deepdelta: learning to repair compilation errors</article-title>
          ,
          <source>in: 27th ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>925</fpage>
          -
          <lpage>936</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Sadowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Van</given-names>
            <surname>Gogh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Jaspan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Soderberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Winter</surname>
          </string-name>
          ,
          <article-title>Tricorder: building a program analysis ecosystem</article-title>
          ,
          <source>in: 37th International Conference on Software Engineering</source>
          , volume
          <volume>1</volume>
          ,
          <year>2015</year>
          , pp.
          <fpage>598</fpage>
          -
          <lpage>608</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Scott</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pradel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chandra</surname>
          </string-name>
          ,
          <article-title>Getafix: learning to fix bugs automatically</article-title>
          ,
          <source>ACM on Programming Languages</source>
          <volume>3</volume>
          (
          <year>2019</year>
          )
          <fpage>1</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>