=Paper= {{Paper |id=Vol-3762/466 |storemode=property |title=GiottoBugFixer: an effective and scalable easy-to-use framework for fixing software issues in a DevOps pipeline |pdfUrl=https://ceur-ws.org/Vol-3762/466.pdf |volume=Vol-3762 |authors=Placido Pellegriti,Carmine Cisca,Fabio Previtali |dblpUrl=https://dblp.org/rec/conf/ital-ia/PellegritiCP24 }} ==GiottoBugFixer: an effective and scalable easy-to-use framework for fixing software issues in a DevOps pipeline== https://ceur-ws.org/Vol-3762/466.pdf
                                GiottoBugFixer: an effective and scalable easy-to-use
                                framework for fixing software issues in a DevOps pipeline
                                Placido Pellegriti1 , Carmine Cisca1 and Fabio Previtali1,*
                                1
                                    AlmavivA S.p.A., Via di Casal Boccone 188/190, Rome, 00137, Italy


                                                  Abstract
                                                   Developing software is one of the most important and crucial activity in the IT domain. It is an important, challenging and
                                                   time consuming activity due to many factors that spaces from software complexity up to testing and deployment phases.
                                                   In the past decades, a plethora of tools have been released for helping developers in coding faster, however they are now
                                                   becoming ineffective and unable to keep up with the change affecting the IT development.
                                                       This paper investigates the potential of generative AI in the realm of software development, focusing on how these
                                                   technologies can augment the coding process, from initial concept to final deployment. It begins by delineating the fundamental
                                                   mechanisms through which generative AI models, such as code completions and automated code generation can enhance
                                                   developer productivity, reduce error rates and streamline the software development lifecycle. We conducted an experimentation
                                                   on several repositories obtaining around 25% of software issues automatically fixed with a 17x speed up.

                                                   Keywords
                                                   Platform Engineering, Software Automation, Generative AI



                                1. Introduction                                                                                             25% of software issues 17x faster than a developer.

                                In the rapidly evolving field of software engineering, un-
                                derstanding the intricacies of the software development                                                     2. Related Work
                                process is crucial for delivering high-quality, efficient and
                                reliable software solutions. This paper delves into the                                                     Developing an automatic code fixer is key for enhancing
                                comprehensive study of the software development lifecy-                                                     programming productivity [1] and is an active area of
                                cle, focusing on pivotal aspects such as code quality, im-                                                  research [2, 3, 4].
                                plementation and testing. By dissecting these elements,                                                        This trend has gained increasing popularity in recent
                                we aim to offer insights into optimizing the development                                                    years. Examples include Google’s Tricorder [5], Face-
                                process, ensuring that software not only meets but ex-                                                      book’s Getafix [6] and Zoncolan and Microsoft’s Visual
                                ceeds the rigorous demands of applications to be realized.                                                  Studio IntelliCode. The techniques underlying these tools
                                   At the heart of any software project lies the quality                                                    can be classified into broadly two categories: logical, rule-
                                of its code, which serves as the cornerstone for func-                                                      based techniques [5] and statistical, data-driven tech-
                                tionality, maintainability, and scalability. We explore                                                     niques [7, 6, 8]. The former uses manually written rules
                                methodologies and practices such as code reviews, static                                                    capturing undesirable code patterns and scans the entire
                                code analysis, and adherence to coding standards that                                                       codebase for these classes of bugs. The latter learns to
                                contribute to enhancing code quality. By integrating                                                        detect abnormal code from a large code corpus using
                                these practices, developers can reduce bugs, facilitate                                                     deep neural networks.
                                easier updates, and ensure a robust foundation for the                                                         Despite great strides, however, both kinds of tools are
                                software’s architecture. The phases of implementation                                                       limited in generality because they target error patterns in
                                and testing are critical for transforming conceptual de-                                                    specific codebases or they target specific bug types. For
                                signs into functioning software. Contributions. This                                                        instance, Zoncolan’s rules are designed to be specifically
                                paper examines how generative AI models have been                                                           applicable to Facebook’s codebases, and deep learning
                                integrated in a DevOps pipeline for helping in improving                                                    models target specialized bugs in variable naming [7]
                                the quality of the software released. We conducted an ex-                                                   or binary expressions [6]. Moreover, the patterns are
                                perimentation on several repositories in Java and C# and                                                    relatively syntactic, allowing them to be specified by
                                we demonstrated that our solution is able to fix around                                                     human experts using logical rulesor learnt from a corpus
                                                                                                                                            of programs.
                                Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-                                        In this paper, we propose an effective and scalable easy-
                                nized by CINI, May 29-30, 2024, Naples, Italy
                                *                                                                                                           to-use framework for fixing software issues in a DevOps
                                  Corresponding author.
                                $ p.pellegriti@almaviva.it (P. Pellegriti); c.cisca@almaviva.it                                             pipeline by means of an LLM model (i.e., GPT3.51 ).
                                (C. Cisca); f.previtali@almaviva.it (F. Previtali)
                                             © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License   1
                                             Attribution 4.0 International (CC BY 4.0).                                                         https://openai.com




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
                                      remove this useless assignment to local variable x
                 Unused fields
                                      remove this unused x local variable

                                            either log or rethrow this exception
                 Exception handling
                                            throw a dedicated exception instead of a generic one

                                      rename this field x to match the regular expression y
 Auto Fix        Best practices
                                      block of commented lines of code should be removed

                                       merge this if statement with the enclosing one
                 Code structure
                                       add a x field to this class

                                         cyclomatic complexity of this method x is greater than the authorized value
                 Code complexity
                                         remove this expression which always evaluates to x


Figure 1: Issue distribution being fixed by the proposed approach among five classes that we defined.



3. Modelling Approach                                           3.3. Prompt
In this section, we describe the LLM models that have
been used, how the prompt has been engineered so that            Engineered Prompt
                                                                 System You are ChatGPT, a code snippet fixer. Your task
it effectively performs for our task as well as the classifi-
                                                                          is to generate a fix for the provided code snippet
cation of the issues based on a taxonomy that we defined.
                                                                          based on the given error message. Do not alter
                                                                          the code snippet other than fixing the error.
3.1. Model Selection                                                      Incomplete code should remain incomplete.
                                                                          Submit your response in JSON format with the
We evaluated the following models:                                        keys: corrected_code, correction_flag,
                                                                          explanation,               renamed_variables.
    1. (OpenAI) gpt-3.5-turbo-0613
                                                                          corrected_code should be contained
    2. (OpenAI) gpt-3.5-turbo-1106                                        in double quotes, and all double quotes in
    3. (OpenAI) gpt-4-0613                                                the code snippet should be escaped with a
    4. (MetaAI) llama-2-7b-hf                                             backslash.      correction_flag should be
                                                                          1 if you have corrected the code snippet,
We used OpenAI models via API on Cloud while we fine-                     0 otherwise. The explanation field should
tuned the Llama 2 model. Fine-tuning has been carried                     contain a brief explanation of the correction.
out by giving examples of snippets pairs incorrect                        renamed_variables should be a Python
code/correct code extracted from our internal repos-                      dictionary containing the names of custom
itories.                                                                  user defined functions or variables that you
                                                                          have renamed as keys, and their new names as
                                                                          values. Do not add any builtin functions you
3.2. Framework                                                            might have changed to renamed_variables.
                                                                 User     I have encountered an error.
We conducted an analysis about the distribution of issues
                                                                          Error message:            "System.Exception"
being fixed by the proposed approach among five classes                     should not be thrown by user
that we defined (see Figure 1). On unused fields,                           Code snippet:
best practices and code structure classes the                               if (archiveResult.Result <= 0) {
proposed solution is able to correct around 50% of the                      await sess.AbortTrans();
issues whilst on the remaining two classes the fixing                         throw new Exception("Fail"); }
rate is around 30%.                                                         Please fix the error in the code snippet without
                                                                            completing it. The code must remain incom-
                                                                            plete and indented as in the original snippet.
                                                                            Please provide a JSON response.
3.4. Post-Processing                                              Repo     Issue Fixed      Tec. Debit Red.   Speed Up
                                                                   #1        100.0 %             63.0 %         10.3x
Following an analysis of common issues observed in                 #2         41.0 %             13.1 %          2.4x
code returned by generative models, a series of post-              #3         36.6 %             10.3 %          2.2x
processing functions have been implemented to enhance              #4         32.5 %             20.0 %          2.0x
the quality of the response both in terms of writing style         #5         46.5 %             26.6 %          2.9x
                                                                   #6         58.3 %             46.4 %         17.0x
and integration with actual code. This manipulation oc-
                                                                   #7         47.3 %             26.7 %          2.0x
curs before the code is inserted into files, prior to under-
                                                                  Avg         51.7 %             29.4 %         5.5x
going quality checks and software compilation.
   Autocompletion errors prevention: Generative                 Table 1
models often tend to complete the input code, which fre-        Results on Java repositories.
quently consist of incomplete fragments, such as if or for
statements without subsequent blocks, or portions that
lack logical coherence when considered out of context.             Unused variables/fields: in this class there are five
To address this issue, lines generated as completions of        SonarQube rules. In order to better understand what
these snippets can be removed, considering the error oc-        kind of issues belong to this class, here two examples: 1)
curs midway through the original snippet. Using Greedy          remove this useless assignment to local variable x and 2)
String Tiling, a metric employed in literature for compar-      remove this unused x local variable.
ing code strings, the last lines of the generated code are         Exception handling: in this class there are six Sonar-
compared with those from the input. If a match with the         Qube rules. The type of issues belonging to this class are
original code’s final line is identified, only the preceding    for example: 1) either log or rethrow this exception and 2)
part up to that line is retained for insertion into the file.   throw a dedicated exception instead of a generic one.
   Indentation correction: The generated code often                Best practices/conventions: in this class there are
loses the information regarding indentation levels, result-     twenty-seven SonarQube rules. The type of issues be-
ing in snippets where the indentation style and depth           longing to this class are for example: 1) rename this field
may differ from the original code. This discrepancy can         x to match the regular expression y and 2) block of com-
include variations in both indentation style (such as tabs      mented lines of code should be removed.
versus spaces) and indentation depth within the snippet.           Code structure/elements: in this class there are
   Despite the flexible rules regarding indentation in cur-     thirty-five SonarQube rules. The type of issues belonging
rently supported languages, a method has been imple-            to this class are for example: 1) merge this if statement
mented to address this issue. This approach, again based        with the enclosing one and 2) add a x field to this class.
on Greedy String Tiling, compares lines between input              Code complexity: in this class there are ten Sonar-
and output code to identify and apply a base indenta-           Qube rules. The type of issues belonging to this class are
tion level that aligns with the indentation found in the        for example: 1) the cyclomatic complexity of this method
received snippet. This ensures improved readability and         x is greater than the authorized value and 2) remove this
quality of the generated code snippet, which is guaran-         expression which always evaluates to x.
teed to have consistent indentation with the surrounding
code.                                                           4.2. Performance Results
                                                            We report the quantitative evaluation of the proposed
4. Experimental Evaluation                                  solution on the two languages of the experimentation.
                                                            In Table 1, we summarize the results on Java language
In this section, we report a study on how issues are dis- on which an average debit reduction of 29,4% has been
tributed and results on two languages that are the most reached, with a peak of 63.0%. The pipeline executes on
widely used by developers.                                  average 5.5 times faster than developers with a peak of
                                                            17 times. In Table 2, we summarize the results on C#
4.1. Issue Distribution                                     language on which an average debit reduction of 25,9%
                                                            has been obtained, with a peak of 42.9%. The pipeline ex-
We conducted an analysis about the distribution of issues
                                                            ecutes on average 2.4 times faster than developers with
being fixed by the proposed approach among five classes
                                                            a peak of 4.8 times. Results on C# are slightly worst
that we defined (see Figure 2).
                                                            because code is more complex and for building and an-
Looking at the plot, on three classes the proposed solution
                                                            alyzing the code more time is required with respect to
is able to correct around 50% of the issues whilst on the
                                                            Java.
remaining two classes the fixing rate is around 30%.
Figure 2: Distribution of the fixed issues on five classes.



  Repo      Issue Fixed       Tec. Debit Red.     Speed Up    References
   #1          46.7 %              36.4 %           2.5x
   #2          30.6 %              12.8 %           1.6x      [1] H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian,
   #3          39.6 %              18.5 %           1.3x
                                                                  R. Bowdidge, Programmers’ build errors: a case
   #4          32.4 %              14.3 %           0.7x
   #5          37.1 %              34.5 %           2.7x
                                                                  study (at google), in: 36th International Conference
   #6          38.2 %              21.8 %           3.0x          on Software Engineering, 2014, pp. 724–734.
   #7          61.2 %              42.9 %           4.8x      [2] E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, K. Wang,
  Avg          40.8 %              25.9 %           2.4x          Hoppity: learning graph transformations to detect
                                                                  and fix bugs in programs, in: International Confer-
Table 2
                                                                  ence on Learning Representations (ICLR), 2020.
Results on C# repositories.
                                                              [3] Y. Ding, B. Ray, P. Devanbu, V. J. Hellendoorn, Patch-
                                                                  ing as translation: the data and the metaphor, in:
                                                                  35th ACM International Conference on Automated
5. Conclusions                                                    Software Engineering, 2020, pp. 275–286.
                                                              [4] A. Mesbah, A. Rice, E. Johnston, N. Glorioso, E. Af-
In conclusion, our comprehensive study elucidates the
                                                                  tandilian, Deepdelta: learning to repair compilation
multifaceted nature of the software development pro-
                                                                  errors, in: 27th ACM Joint Meeting on European
cess, offering insights into optimizing development prac-
                                                                  Software Engineering Conference and Symposium
tices to meet and exceed the demanding requirements
                                                                  on the Foundations of Software Engineering, 2019,
of today’s applications. The integration of generative AI
                                                                  pp. 925–936.
models into the software development lifecycle marks a
                                                              [5] C. Sadowski, J. Van Gogh, C. Jaspan, E. Soderberg,
significant advancement, showcasing the potential to rev-
                                                                  C. Winter, Tricorder: building a program analysis
olutionize how software is developed, tested, and main-
                                                                  ecosystem, in: 37th International Conference on
tained. This paper contributes to the body of knowledge
                                                                  Software Engineering, volume 1, 2015, pp. 598–608.
by demonstrating the effectiveness of these models in
                                                              [6] J. Bader, A. Scott, M. Pradel, S. Chandra, Getafix:
improving software quality and development efficiency,
                                                                  learning to fix bugs automatically, ACM on Pro-
setting a precedent for future research and application
                                                                  gramming Languages 3 (2019) 1–27.
in the field of software engineering.
[7] M. Allamanis, M. Brockschmidt, M. Khademi, Learn- [8] M. Vasic, A. Kanade, P. Maniatis, D. Bieber, R. Singh,
    ing to represent programs with graphs, arXiv          Neural program repair by jointly learning to localize
    preprint arXiv:1711.00740 (2017).                     and repair, arXiv preprint arXiv:1904.01720 (2019).