=Paper=
{{Paper
|id=Vol-3762/466
|storemode=property
|title=GiottoBugFixer: an effective and scalable easy-to-use framework for fixing software issues in a DevOps pipeline
|pdfUrl=https://ceur-ws.org/Vol-3762/466.pdf
|volume=Vol-3762
|authors=Placido Pellegriti,Carmine Cisca,Fabio Previtali
|dblpUrl=https://dblp.org/rec/conf/ital-ia/PellegritiCP24
}}
==GiottoBugFixer: an effective and scalable easy-to-use framework for fixing software issues in a DevOps pipeline==
GiottoBugFixer: an effective and scalable easy-to-use
framework for fixing software issues in a DevOps pipeline
Placido Pellegriti1 , Carmine Cisca1 and Fabio Previtali1,*
1
AlmavivA S.p.A., Via di Casal Boccone 188/190, Rome, 00137, Italy
Abstract
Developing software is one of the most important and crucial activity in the IT domain. It is an important, challenging and
time consuming activity due to many factors that spaces from software complexity up to testing and deployment phases.
In the past decades, a plethora of tools have been released for helping developers in coding faster, however they are now
becoming ineffective and unable to keep up with the change affecting the IT development.
This paper investigates the potential of generative AI in the realm of software development, focusing on how these
technologies can augment the coding process, from initial concept to final deployment. It begins by delineating the fundamental
mechanisms through which generative AI models, such as code completions and automated code generation can enhance
developer productivity, reduce error rates and streamline the software development lifecycle. We conducted an experimentation
on several repositories obtaining around 25% of software issues automatically fixed with a 17x speed up.
Keywords
Platform Engineering, Software Automation, Generative AI
1. Introduction 25% of software issues 17x faster than a developer.
In the rapidly evolving field of software engineering, un-
derstanding the intricacies of the software development 2. Related Work
process is crucial for delivering high-quality, efficient and
reliable software solutions. This paper delves into the Developing an automatic code fixer is key for enhancing
comprehensive study of the software development lifecy- programming productivity [1] and is an active area of
cle, focusing on pivotal aspects such as code quality, im- research [2, 3, 4].
plementation and testing. By dissecting these elements, This trend has gained increasing popularity in recent
we aim to offer insights into optimizing the development years. Examples include Google’s Tricorder [5], Face-
process, ensuring that software not only meets but ex- book’s Getafix [6] and Zoncolan and Microsoft’s Visual
ceeds the rigorous demands of applications to be realized. Studio IntelliCode. The techniques underlying these tools
At the heart of any software project lies the quality can be classified into broadly two categories: logical, rule-
of its code, which serves as the cornerstone for func- based techniques [5] and statistical, data-driven tech-
tionality, maintainability, and scalability. We explore niques [7, 6, 8]. The former uses manually written rules
methodologies and practices such as code reviews, static capturing undesirable code patterns and scans the entire
code analysis, and adherence to coding standards that codebase for these classes of bugs. The latter learns to
contribute to enhancing code quality. By integrating detect abnormal code from a large code corpus using
these practices, developers can reduce bugs, facilitate deep neural networks.
easier updates, and ensure a robust foundation for the Despite great strides, however, both kinds of tools are
software’s architecture. The phases of implementation limited in generality because they target error patterns in
and testing are critical for transforming conceptual de- specific codebases or they target specific bug types. For
signs into functioning software. Contributions. This instance, Zoncolan’s rules are designed to be specifically
paper examines how generative AI models have been applicable to Facebook’s codebases, and deep learning
integrated in a DevOps pipeline for helping in improving models target specialized bugs in variable naming [7]
the quality of the software released. We conducted an ex- or binary expressions [6]. Moreover, the patterns are
perimentation on several repositories in Java and C# and relatively syntactic, allowing them to be specified by
we demonstrated that our solution is able to fix around human experts using logical rulesor learnt from a corpus
of programs.
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga- In this paper, we propose an effective and scalable easy-
nized by CINI, May 29-30, 2024, Naples, Italy
* to-use framework for fixing software issues in a DevOps
Corresponding author.
$ p.pellegriti@almaviva.it (P. Pellegriti); c.cisca@almaviva.it pipeline by means of an LLM model (i.e., GPT3.51 ).
(C. Cisca); f.previtali@almaviva.it (F. Previtali)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1
Attribution 4.0 International (CC BY 4.0). https://openai.com
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
remove this useless assignment to local variable x
Unused fields
remove this unused x local variable
either log or rethrow this exception
Exception handling
throw a dedicated exception instead of a generic one
rename this field x to match the regular expression y
Auto Fix Best practices
block of commented lines of code should be removed
merge this if statement with the enclosing one
Code structure
add a x field to this class
cyclomatic complexity of this method x is greater than the authorized value
Code complexity
remove this expression which always evaluates to x
Figure 1: Issue distribution being fixed by the proposed approach among five classes that we defined.
3. Modelling Approach 3.3. Prompt
In this section, we describe the LLM models that have
been used, how the prompt has been engineered so that Engineered Prompt
System You are ChatGPT, a code snippet fixer. Your task
it effectively performs for our task as well as the classifi-
is to generate a fix for the provided code snippet
cation of the issues based on a taxonomy that we defined.
based on the given error message. Do not alter
the code snippet other than fixing the error.
3.1. Model Selection Incomplete code should remain incomplete.
Submit your response in JSON format with the
We evaluated the following models: keys: corrected_code, correction_flag,
explanation, renamed_variables.
1. (OpenAI) gpt-3.5-turbo-0613
corrected_code should be contained
2. (OpenAI) gpt-3.5-turbo-1106 in double quotes, and all double quotes in
3. (OpenAI) gpt-4-0613 the code snippet should be escaped with a
4. (MetaAI) llama-2-7b-hf backslash. correction_flag should be
1 if you have corrected the code snippet,
We used OpenAI models via API on Cloud while we fine- 0 otherwise. The explanation field should
tuned the Llama 2 model. Fine-tuning has been carried contain a brief explanation of the correction.
out by giving examples of snippets pairs incorrect renamed_variables should be a Python
code/correct code extracted from our internal repos- dictionary containing the names of custom
itories. user defined functions or variables that you
have renamed as keys, and their new names as
values. Do not add any builtin functions you
3.2. Framework might have changed to renamed_variables.
User I have encountered an error.
We conducted an analysis about the distribution of issues
Error message: "System.Exception"
being fixed by the proposed approach among five classes should not be thrown by user
that we defined (see Figure 1). On unused fields, Code snippet:
best practices and code structure classes the if (archiveResult.Result <= 0) {
proposed solution is able to correct around 50% of the await sess.AbortTrans();
issues whilst on the remaining two classes the fixing throw new Exception("Fail"); }
rate is around 30%. Please fix the error in the code snippet without
completing it. The code must remain incom-
plete and indented as in the original snippet.
Please provide a JSON response.
3.4. Post-Processing Repo Issue Fixed Tec. Debit Red. Speed Up
#1 100.0 % 63.0 % 10.3x
Following an analysis of common issues observed in #2 41.0 % 13.1 % 2.4x
code returned by generative models, a series of post- #3 36.6 % 10.3 % 2.2x
processing functions have been implemented to enhance #4 32.5 % 20.0 % 2.0x
the quality of the response both in terms of writing style #5 46.5 % 26.6 % 2.9x
#6 58.3 % 46.4 % 17.0x
and integration with actual code. This manipulation oc-
#7 47.3 % 26.7 % 2.0x
curs before the code is inserted into files, prior to under-
Avg 51.7 % 29.4 % 5.5x
going quality checks and software compilation.
Autocompletion errors prevention: Generative Table 1
models often tend to complete the input code, which fre- Results on Java repositories.
quently consist of incomplete fragments, such as if or for
statements without subsequent blocks, or portions that
lack logical coherence when considered out of context. Unused variables/fields: in this class there are five
To address this issue, lines generated as completions of SonarQube rules. In order to better understand what
these snippets can be removed, considering the error oc- kind of issues belong to this class, here two examples: 1)
curs midway through the original snippet. Using Greedy remove this useless assignment to local variable x and 2)
String Tiling, a metric employed in literature for compar- remove this unused x local variable.
ing code strings, the last lines of the generated code are Exception handling: in this class there are six Sonar-
compared with those from the input. If a match with the Qube rules. The type of issues belonging to this class are
original code’s final line is identified, only the preceding for example: 1) either log or rethrow this exception and 2)
part up to that line is retained for insertion into the file. throw a dedicated exception instead of a generic one.
Indentation correction: The generated code often Best practices/conventions: in this class there are
loses the information regarding indentation levels, result- twenty-seven SonarQube rules. The type of issues be-
ing in snippets where the indentation style and depth longing to this class are for example: 1) rename this field
may differ from the original code. This discrepancy can x to match the regular expression y and 2) block of com-
include variations in both indentation style (such as tabs mented lines of code should be removed.
versus spaces) and indentation depth within the snippet. Code structure/elements: in this class there are
Despite the flexible rules regarding indentation in cur- thirty-five SonarQube rules. The type of issues belonging
rently supported languages, a method has been imple- to this class are for example: 1) merge this if statement
mented to address this issue. This approach, again based with the enclosing one and 2) add a x field to this class.
on Greedy String Tiling, compares lines between input Code complexity: in this class there are ten Sonar-
and output code to identify and apply a base indenta- Qube rules. The type of issues belonging to this class are
tion level that aligns with the indentation found in the for example: 1) the cyclomatic complexity of this method
received snippet. This ensures improved readability and x is greater than the authorized value and 2) remove this
quality of the generated code snippet, which is guaran- expression which always evaluates to x.
teed to have consistent indentation with the surrounding
code. 4.2. Performance Results
We report the quantitative evaluation of the proposed
4. Experimental Evaluation solution on the two languages of the experimentation.
In Table 1, we summarize the results on Java language
In this section, we report a study on how issues are dis- on which an average debit reduction of 29,4% has been
tributed and results on two languages that are the most reached, with a peak of 63.0%. The pipeline executes on
widely used by developers. average 5.5 times faster than developers with a peak of
17 times. In Table 2, we summarize the results on C#
4.1. Issue Distribution language on which an average debit reduction of 25,9%
has been obtained, with a peak of 42.9%. The pipeline ex-
We conducted an analysis about the distribution of issues
ecutes on average 2.4 times faster than developers with
being fixed by the proposed approach among five classes
a peak of 4.8 times. Results on C# are slightly worst
that we defined (see Figure 2).
because code is more complex and for building and an-
Looking at the plot, on three classes the proposed solution
alyzing the code more time is required with respect to
is able to correct around 50% of the issues whilst on the
Java.
remaining two classes the fixing rate is around 30%.
Figure 2: Distribution of the fixed issues on five classes.
Repo Issue Fixed Tec. Debit Red. Speed Up References
#1 46.7 % 36.4 % 2.5x
#2 30.6 % 12.8 % 1.6x [1] H. Seo, C. Sadowski, S. Elbaum, E. Aftandilian,
#3 39.6 % 18.5 % 1.3x
R. Bowdidge, Programmers’ build errors: a case
#4 32.4 % 14.3 % 0.7x
#5 37.1 % 34.5 % 2.7x
study (at google), in: 36th International Conference
#6 38.2 % 21.8 % 3.0x on Software Engineering, 2014, pp. 724–734.
#7 61.2 % 42.9 % 4.8x [2] E. Dinella, H. Dai, Z. Li, M. Naik, L. Song, K. Wang,
Avg 40.8 % 25.9 % 2.4x Hoppity: learning graph transformations to detect
and fix bugs in programs, in: International Confer-
Table 2
ence on Learning Representations (ICLR), 2020.
Results on C# repositories.
[3] Y. Ding, B. Ray, P. Devanbu, V. J. Hellendoorn, Patch-
ing as translation: the data and the metaphor, in:
35th ACM International Conference on Automated
5. Conclusions Software Engineering, 2020, pp. 275–286.
[4] A. Mesbah, A. Rice, E. Johnston, N. Glorioso, E. Af-
In conclusion, our comprehensive study elucidates the
tandilian, Deepdelta: learning to repair compilation
multifaceted nature of the software development pro-
errors, in: 27th ACM Joint Meeting on European
cess, offering insights into optimizing development prac-
Software Engineering Conference and Symposium
tices to meet and exceed the demanding requirements
on the Foundations of Software Engineering, 2019,
of today’s applications. The integration of generative AI
pp. 925–936.
models into the software development lifecycle marks a
[5] C. Sadowski, J. Van Gogh, C. Jaspan, E. Soderberg,
significant advancement, showcasing the potential to rev-
C. Winter, Tricorder: building a program analysis
olutionize how software is developed, tested, and main-
ecosystem, in: 37th International Conference on
tained. This paper contributes to the body of knowledge
Software Engineering, volume 1, 2015, pp. 598–608.
by demonstrating the effectiveness of these models in
[6] J. Bader, A. Scott, M. Pradel, S. Chandra, Getafix:
improving software quality and development efficiency,
learning to fix bugs automatically, ACM on Pro-
setting a precedent for future research and application
gramming Languages 3 (2019) 1–27.
in the field of software engineering.
[7] M. Allamanis, M. Brockschmidt, M. Khademi, Learn- [8] M. Vasic, A. Kanade, P. Maniatis, D. Bieber, R. Singh,
ing to represent programs with graphs, arXiv Neural program repair by jointly learning to localize
preprint arXiv:1711.00740 (2017). and repair, arXiv preprint arXiv:1904.01720 (2019).