=Paper=
{{Paper
|id=Vol-3762/523
|storemode=property
|title=Using Large Language Models to Support Software Engineering Documentation in Waterfall Life Cycles: Are We There Yet?
|pdfUrl=https://ceur-ws.org/Vol-3762/523.pdf
|volume=Vol-3762
|authors=Antonio Della Porta,Vincenzo De Martino,Gilberto Recupito,Carmine Iemmino,Gemma Catolino,Dario Di Nucci,Fabio Palomba
|dblpUrl=https://dblp.org/rec/conf/ital-ia/PortaMRICNP24
}}
==Using Large Language Models to Support Software Engineering Documentation in Waterfall Life Cycles: Are We There Yet?==
Using Large Language Models to Support Software
Engineering Documentation in Waterfall Life Cycles: Are
We There Yet?
Antonio Della Porta1,∗,† , Vincenzo De Martino1,† , Gilberto Recupito1,† , Carmine Iemmino1,† ,
Gemma Catolino1,† , Dario Di Nucci1,† and Fabio Palomba1,†
1
SeSa Lab - Università Degli Studi di Salerno, Via Giovanni Paolo II, 132, 84084 Fisciano, Salerno, Italy
Abstract
Software documentation is key for producing high-quality projects and ensuring their smooth evolution. Nonetheless, the
activity of writing software artifacts is time-consuming and effort-prone. Looking at the existing body of knowledge, we
outline limited evidence of how automated approaches may support practitioners when documenting the artifacts produced
throughout the software lifecycle. In particular, there is still a lack of investigations into the capabilities of Large Language
Models (LLMs), which are indeed supposed to be highly beneficial in this respect. In this paper, we propose a preliminary
case study to understand how LLMs can support the development of the documentation of projects developed through a
Waterfall lifecycle. Using ChatGPT, we engineered specific prompts to generate and validate the artifacts produced, taking an
existing, documented software engineering project as an oracle. The main findings of the study show the ability of ChatGPT
to produce most artifacts correctly. In addition, we find that software engineers would require a relatively low effort to adapt
the outputs provided by ChatGPT to their own context, especially for textual artifacts.
Keywords
Large Language Model, Artificial Intelligence for Software Engineering, ChatGPT,
1. Introduction more complex issues [4].
These benefits allowed us to resolve key issues in soft-
Integrating Large Language Models (LLMs) into vari- ware engineering tasks, especially considering software
ous domains has recently garnered significant attention. development and maintenance activities [5]. However,
Recent statistics indicate that ChatGPT, a prominent ex- other software engineering tasks, especially those related
ample of LLM, has gathered over 180 million users, un- to documentation, are still defined as key challenges [6].
derscoring the widespread adoption of such models [1]. Since there is a lack of studies in this specific field, we aim
LLMs showcase a remarkable versatility, particularly in to provide preliminary results to show the capabilities of
software engineering [2], thus leading practitioners to an LLM to tackle the challenge of crafting software doc-
wonder how these models can effectively replicate their umentation. We selected a Waterfall Life Cycle project
tasks. From here, there is a need to explore their potential to explore LLMs’ documentation abilities across devel-
within the Software Development Lifecycle (SDLC). In opment phases, from requirements to technical details.
particular, the literature showed how LLMs can simulate Through this preliminary case study, we employed Chat-
team members in a development environment, perform GPT 1 to generate documentation artifacts.
code analysis, generate code, and predict bugs [3]. These We aim to evaluate ChatGPT’s real-world efficiency
AI-powered systems can analyze large amounts of code by comparing it to a benchmark project and gauging the
and data quickly and accurately, enabling automation effort to produce similarly high-quality artifacts. Prelimi-
of repetitive tasks and allowing developers to focus on nary findings suggest ChatGPT eases documentation and
speeds up design replication but requires human input
Ital-IA 2024: 4th National Conference on Artificial Intelligence, orga-
nized by CINI, May 29-30, 2024, Naples, Italy
for response refinement and query tuning. Initial integra-
∗
Corresponding author. tion efforts are moderate, but some artifacts necessitated
†
These authors contributed equally. revised prompts and external software for satisfactory
Envelope-Open adellaporta@unisa.it (A. Della Porta); vdemartino@unisa.it outcomes.
(V. De Martino); grecupito@unisa.it (G. Recupito);
c.iemmino@studenti.unisa.it (C. Iemmino); gcatolino@unisa.it
(G. Catolino); ddinucci@unisa.it (D. Di Nucci); fpalomba@unisa.it 2. Related Work
(F. Palomba)
Orcid 0000-0003-1860-8404 (A. Della Porta); 0000-0003-1485-4560 Artificial Intelligence for Software Engineering (AI4SE)
(V. De Martino); 0000-0001-8088-1001 (G. Recupito);
0000-0002-4689-3401 (G. Catolino); 0000-0003-4927-9324 (D. Di is a well-known research area that aims to develop AI
Nucci); 0000-0001-9337-5116 (F. Palomba)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License 1
Attribution 4.0 International (CC BY 4.0). https://chat.openai.com
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
solutions and SE practices to improve software develop- 3. Research Method
ment processes and tools [7, 8]. With the emergence and
proliferation of LLMs, this field has encountered new The goal of the study was to determine to what extent
opportunities to support and streamline the labors of LLMs can support the activities of a software engineer
software engineers and researchers [5]. when writing documentation in a software project em-
In the vein of such advancements, De Vito et al. [9] ploying the Waterfall Life Cycle model, with the purpose
introduced ECHO, an innovative method utilizing LLMs of providing software engineers elements that can be
to aid software engineers in improving the quality of leveraged to support and improve the design process of
UML use cases. Further extending the utility of AI in software projects. The perspective is of both researchers
SE, De Vito et al. [10] a chatbot designed for software and practitioners. The former are interested in under-
engineering, streamlining tasks like code review, testing, standing the current potential and limitations of using
and criteria evaluation. LLMs for documentation tasks, possibly identifying op-
Ahmad et al. [11] explore the role of ChatGPT as a bot portunities for further research and improvement. The
in collaborative software architecting to support the anal- latter are interested in assessing how LLMs can act as
ysis, synthesis, and evaluation of microservices-based documentation assistants in practice, verifying whether
software. A study by Liang et al. [12] surveyed develop- these models may be employed in real-world contexts
ers’ perceptions, noting issues like code not meeting re- and potentially integrating them into their workflow.
quirements. Despite these advancements, the domain of
AI-assisted documentation in SE remains underexplored, 3.1. Research Question
especially the comprehensive support for the entire doc-
umentation lifecycle. Our research question aimed to understand whether
As Robillard et al. [13] highlighted, traditional docu- LMMs can substantially support the software documen-
mentation practices are inefficient because of the man- tation activities developed using a Waterfall Life model.
ual nature of its creation and the gap between creators Understanding how documentation writing activities us-
and consumers. Aghajani et al. [14] reported that doc- ing LLM can improve artifacts and possibly reduce effort
umentation suffers numerous shortcomings and prob- would be crucial. We chose ChatGPT because of its popu-
lems, including insufficient and inadequate content and larity and availability, in line with similar studies [23, 11].
outdated and ambiguous information. Recent investiga- In this context, we formulated the following research
tions have further explored the extent to which LLMs question.
can assist in tasks like writing code [15], conducting code
� RQ1 . To what extent can ChatGPT support software
reviews [16], providing code explanations [17], and teach-
engineering documentation tasks in a Waterfall Life Cycle
ing programming concepts [18]. These studies suggest
model?
the potentiality of LLMs to create significant support in
the activities involved in the SDLC and focus the human To address our research question, we conducted a pre-
effort on the quality and relevance of the results. liminary case study [24] using an oracle project and com-
White et al. [19] emphasized the importance of prompt paring it to the output of the LLM to provide insights into
engineering to guide LLMs by presenting a catalog of understanding its usefulness for documentation tasks.
patterns to dialogue with LLMs to achieve satisfactory We followed the guidelines by Wohlin et al. [25] and the
outputs. A well-written prompt enables correct answers ACM/SIGSOFT Empirical Standards for the report.2
by minimizing prompts [20, 21, 22]. Our work builds on
these studies, exploiting how to use prompts to support
3.2. Context of the Study
documentation artifacts.
Our research is motivated by the goal of comprehen- To address the goal of our work and provide preliminary
sively understanding how ChatGPT can support both stu- insights into the capabilities of ChatGPT for documenta-
dents and practitioners during the software development tion tasks, we selected a project named Rojina Review, a
lifecycle, focusing on creating improved documentation web-based platform for news and reviews of video games.
of software systems. We aim to shed light on the role This project has 100k lines of code and was initially de-
of ChatGPT and LLMs in simplifying the development veloped by a team of three software engineering students
process and assess the complexities involved in using at our university using a Waterfall lifecycle. On the one
ChatGPT to produce high-quality results. hand, we selected a fully developed project, i.e., with the
full set of artifacts already developed to have a ground
truth against which to assess the capabilities of ChatGPT.
2
Available at https://github.com/acmsigsoft/EmpiricalStandards. We
leveraged the guidelines available for “General Standard” and “Case
Study”.
Table 1 output by the three first authors. The artifact produced
Generated Artifacts by ChatGPT was compared with the same artifact in
Document Description Artifact Considered Rojina Review. The three first authors of the paper
Scenarios had to agree to make an artifact acceptable. In case
Gathers and Functional Requirements of disagreement, a collaborative discussion was facil-
Requirements
Analysis
analyzes the Non Functional Require- itated to address and resolve assessment disparities.
system ments
Document requirements. Use Cases
Afterward, the feedback was re-submitted to improve
Class Diagram the quality of the artifact. In this case, the discussion
Sequence Diagram about creating the artifact continued, and the feedback
Statechart Diagram from this phase was provided to ChatGPT until the
Outlines the
Design Goals output was evaluated compliant for the evaluators or
System Design Subsystems Division
overall system
Software/Hardware Map-
the LLM could not respond better than the previous
Document
architecture.
ping phase.
Boundary Conditions
Object Design Defines the com- Class Interfaces When the third step of the process was completed,
Document ponent design. Design Pattern the second step was repeated to create the next artifact.
Test Plan & Test Describes how to Test Case Specifications Additionally, we noted that the language seemed more
Case Specification test the system. Category Partition
accurate when we asked ChatGPT to impersonate a soft-
ware engineer. For this reason, we used a generic prompt
that guided our research:
On the other hand, this project was closely supervised by
the paper’s authors. We were familiar with the business Prompt of Requirement Tasks
case and the artifacts that should have been developed,
but also confident of the quality of the project. We are You have to impersonate a software engi-
aware of potential threats to internal and external valid- neer who has to produce the project docu-
ity related to this choice. However, we believe the project mentation of a software project. Consider
was good enough to ensure a satisfactory preliminary as- the following problem statement to gener-
sessment. Following Bruegge and Dutoit [26], we briefly ate the output:
explain the documents created for this project in Table 1.
#Optional: Given that you have (e.g., the non-functional re-
3.3. Formulating the Waterfall Story quirements in the RAD)
Before starting our study, we gathered a working group Generate for the
to determine a suitable prompt for ChatGPT. We adopted scope of the software project that we de-
a specific prompting process when interacting with Chat- fined
GPT for all artifacts to be created. This method allows the #Optional(only for UML artifacts) using
conduction of the activities to produce documentation the PlantUML syntax.
artifacts, simulating the phases of the Waterfall lifecycle
Model. In detail, the process includes three steps: We then started to generate the documentation in an
iterative and incremental process. The set of the doc-
#1-Initial interaction: We set up the environment umentation artifacts, according to the Waterfall Model,
in ChatGPT. Specifically, we adopted a single chat to the five main documents, and related tasks, are specified
interact and prevent the LLM from losing the project in Table 1.
context. Subsequently, we provided ChatGPT with an
initial prompt containing the preliminary information
of the project. We asked ChatGPT to provide informa- 3.4. Data Extraction
tion concerning the problem statement. From the documentation of the project selected, we ex-
tracted the document produced for each phase of the
#2-Artifact generation: to maintain the context of
Waterfall Model, a set of the most important artifacts as
the output generated in the previous phase, we asked
listed in Table 1.
ChatGPT to provide the previous artifact at each de-
We produced a prompt for each artifact that ChatGPT
velopment phase.
could use to generate the artifact. For the generation
3
#3-Inter-rater assessment: following the extraction of the diagrams, we have used PlantUML . This open-
of answers provided by ChatGPT, an inter-rater assess- source tool allows users to create Unified Modeling Lan-
ment process was initiated to evaluate the generated 3 Source code available at https://github.com/plantuml/plantuml
guage (UML) diagrams using a plain text language. The Table 3
tool follows the findings of Cámara et al. [27], stating that Results
ChatGPT produces fewer syntactic mistakes and gets sig-
Artifact Effort
nificantly better results when using PlantUML compared
4
to other tools, such as USE tool. Scenarios Medium Effort
Functional Requirements Low Effort
Non Functional Require- Low Effort
Table 2
ments
Effort Mapping
Use Cases Medium Effort
Effort Description Class Diagram High Effort
Low Effort The desired answer is obtained with a maxi- Sequence Diagram High Effort
mum of two prompts, does not need to be Statechart Diagram Medium Effort
too much articulated, and does not require Design Goals Medium Effort
corrections, so it can easily used. Subsystems Division High Effort
Medium Ef- The desired answer is produced with sev- Software/Hardware Map- Low Effort
fort eral prompts ranging from three to five; the ping
response may require manual modification
Boundary Conditions Low Effort
where it is more complicated to have the bot
adjust the response. Class Interfaces Low Effort
High Effort The desired answer is obtained with a mini- Design Pattern Low Effort
mum of six very detailed prompts, and the Test case specification Medium Effort
response requires manual corrections that Category Partition High Effort
the bot cannot implement.
interaction. On the same line, the results for the non-
3.5. Data Analysis functional requirements; by defining the functional ones,
ChatGPT has been able to extract directly the related
To analyze the result obtained using ChatGPT, the first non-functional requirements with a single prompt. Use
three authors of the paper, who have significant expe- Cases need specific prompts for each system’s function-
rience in software engineering both from an academic ality defined previously. Moreover, additional prompts
and enterprise perspective, had defined a set of criteria were required to get the alternative flows. For the class
to evaluate the effort needed by a software engineer who diagram, ChatGPT failed to produce a correct result with
has to be supported in creating the artifacts of the docu- the right hierarchies, relationships, and cardinality. We
mentation. Those criteria, listed in Table 2, consider the observed the need to write the specific string “system
number of prompts needed and the level of adjustment class diagram” to obtain results, allowing ChatGPT to
of the prompt to reach an optimal result from ChatGPT. report associations among classes. For these reasons, the
The final acceptance of each artifact produced by Chat- LLM fail to give a correct result.
GPT was given by comparing it with the same artifact in On the one hand, in the statechart, a restricted number
Rojina Review to assess the quality. of prompts were needed to generate artifacts comparable
to Rojina Review. On the other hand, the Sequence
Diagrams needed more prompts with additional specifi-
4. Preliminary Results cations to achieve a good result.
We submitted the prompts to ChatGPT for each selected We needed a few prompts to generate the design
artifact to address our research question and obtained goals; assigning and ordering using priority needed more
the results detailed in Table 3. We started with the extrac- prompts. The subsystems division needed many prompts
tion of scenarios. During the interaction, we noted that and corrections to get a result comparable with the arti-
ChatGPT finds difficulties in identifying key elements fact of Rojina Review because initially, ChatGPT pro-
in the context. For instance, actors involved in a spe- duced a semantically incorrect division, so we needed to
cific functionality are switched compared to the context provide more details and required the PlantUML code.
of the system given in input. Therefore, we added ad- There were no issues for software/hardware mapping,
ditional prompts to address these issues. Subsequently, boundary conditions, class interfaces, and design patterns:
we extracted functional requirements; ChatGPT produced ChatGPT has been able to generate a good result without
well-structured and formatted requirements after the first effort.
For the testing artifacts of the project, the category
partition required many prompts and was very specific
4
Source code available at https://github.com/useocl/use
for each functionality to test. Otherwise, the test case preliminary findings suggest ChatGPT reduces time and
specifications was easier, as is using the category partition effort. Future work will involve a longitudinal study with
as input to build each test. professional feedback, exploring how prompt generation
expertise enhances real-world outputs.
5. Threats to Validity
Acknowledgments
Construct Validity. The main concern for construct
validity in our study concerns subject selection, partic- This work has been partially supported by the Euro-
ularly the version of the AI model. For evaluation, we pean Union - NextGenerationEU through the Italian Min-
used the GPT-3.5 model, the most advanced and avail- istry of University and Research, Projects PRIN 2022
able version during the research. Even if the GPT-4 ”QualAI: Continuous Quality Improvement of AI-based
version has been released, the use is currently limited Systems” (grant n. 2022B3BP5S , CUP: H53D23003510006)
by strict speed limits, and early feedback from the user and PRIN 2022 PNRR ”FRINGE: context-aware FaiRness
community suggests potential stability and accuracy engineerING in complex software systEms” (grant n.
issues. P2022553SL, CUP: D53D23017340001). The opinions pre-
sented in this article solely belong to the author(s) and
Internal Validity. To ensure robust internal validity,
do not necessarily reflect those of the European Union
we carefully considered factors that could influence
or The European Research Executive Agency. The Euro-
the outcomes derived from the LLM. Recognizing that
pean Union and the granting authority cannot be held
LLMs’ responses are susceptible to prompt formula-
accountable for these views.
tion, we conducted preliminary tests to identify the
most effective prompt structures [19, 22]. This step
was crucial to minimize variations in the model’s re- References
sponses that could arise from prompt-related biases,
thereby ensuring that our findings more accurately re- [1] DemandSage, Chatgpt statistics for 2024 (users de-
flect the capabilities of the LLM rather than the nuances mographics and facts), 2024. URL: https://www.
of our prompt phrasing. Additionally, each interaction demandsage.com/chatgpt-statistics/, accessed: Jan-
with the LLM was assessed iteratively by more authors uary 13, 2024.
through inter-rater assessment, allowing the reduction [2] S. Wang, L. Huang, A. Gao, J. Ge, T. Zhang, H. Feng,
of the subjectivity of the results. We evaluated the ac- I. Satyarth, M. Li, H. Zhang, V. Ng, Machine/deep
curacy of documents generated by ChatGPT using a learning for software engineering: A systematic
high-quality project from an undergraduate software literature review, IEEE Transactions on Software
engineering course as an oracle. This comparison was Engineering 49 (2023) 1188–1231. doi:10.1109/TSE.
critical to verify that the observed results were indeed 2022.3173346 .
attributable to ChatGPT’s capabilities. [3] L. Belzner, T. Gabor, M. Wirsing, Large language
model assisted software engineering: prospects,
External Validity. The external validity threat exam- challenges, and a case study, in: International Con-
ines whether the results of a study can be generalized ference on Bridging the Gap between AI and Reality,
to other contexts. We experienced only one case study Springer, 2023, pp. 355–374.
of moderate complexity, which may limit the generaliz- [4] Y. K. Dwivedi, N. Kshetri, L. Hughes, E. L. Slade,
ability of the study. Scenarios with greater development A. Jeyaraj, A. K. Kar, A. M. Baabdullah, A. Koohang,
complexity, different types of development (e.g., agile V. Raghavan, M. Ahuja, et al., “so what if chatgpt
instead of waterfall), and human writing prompt skills wrote it?” multidisciplinary perspectives on oppor-
may affect the external validity of this research. Future tunities, challenges and implications of generative
work may involve validating the process with project conversational ai for research, practice and policy,
managers and a more significant number of software International Journal of Information Management
projects to minimize this external threat to validity. 71 (2023) 102642.
[5] X. Hou, Y. Zhao, Y. Liu, Z. Yang, K. Wang, L. Li,
6. Conclusion and Future Work X. Luo, D. Lo, J. Grundy, H. Wang, Large language
models for software engineering: A systematic lit-
In our study, to what extent ChatGPT can support soft- erature review, arXiv preprint arXiv:2308.10620
ware engineers in documenting waterfall projects. We (2023).
compared its use with a high-level university project, [6] I. Ozkaya, Application of large language models
focusing on response variability, design impact, and the to software engineering tasks: Opportunities, risks,
balance between AI support and human oversight. Our
and implications, IEEE Software 40 (2023) 4–8. [16] Q. Guo, J. Cao, X. Xie, S. Liu, X. Li, B. Chen,
doi:10.1109/MS.2023.3248401 . X. Peng, Exploring the potential of chatgpt in auto-
[7] M. Barenkamp, J. Rebstadt, O. Thomas, Applica- mated code refinement: An empirical study, arXiv
tions of ai in classical software engineering, AI preprint arXiv:2309.08221 (2023).
Perspectives 2 (2020) 1. [17] J. Leinonen, P. Denny, S. MacNeil, S. Sarsa, S. Bern-
[8] T. Xie, Intelligent software engineering: Synergy stein, J. Kim, A. Tran, A. Hellas, Comparing code ex-
between ai and software engineering, in: Proceed- planations created by students and large language
ings of the 11th Innovations in Software Engineer- models, arXiv preprint arXiv:2304.03938 (2023).
ing Conference, 2018, pp. 1–1. [18] A. Hellas, J. Leinonen, S. Sarsa, C. Koutcheme, L. Ku-
[9] G. De Vito, F. Palomba, C. Gravino, S. Di Martino, janpää, J. Sorva, Exploring the responses of large
F. Ferrucci, Echo: An approach to enhance use case language models to beginner programmers’ help
quality exploiting large language models, in: 2023 requests, arXiv preprint arXiv:2306.05715 (2023).
49th Euromicro Conference on Software Engineer- [19] J. White, Q. Fu, S. Hays, M. Sandborn, C. Olea,
ing and Advanced Applications (SEAA), 2023, pp. H. Gilbert, A. Elnashar, J. Spencer-Smith, D. C.
53–60. doi:10.1109/SEAA60479.2023.00017 . Schmidt, A prompt pattern catalog to enhance
[10] G. De Vito, S. Lambiase, F. Palomba, F. Ferrucci, prompt engineering with chatgpt, arXiv preprint
Meet c4se: Your new collaborator for software en- arXiv:2302.11382 (2023).
gineering tasks, in: 2023 49th Euromicro Confer- [20] E. A. Van Dis, J. Bollen, W. Zuidema, R. van Rooij,
ence on Software Engineering and Advanced Ap- C. L. Bockting, Chatgpt: five priorities for research,
plications (SEAA), 2023, pp. 235–238. doi:10.1109/ Nature 614 (2023) 224–226.
SEAA60479.2023.00044 . [21] S. Arora, A. Narayan, M. F. Chen, L. Orr, N. Guha,
[11] A. Ahmad, M. Waseem, P. Liang, M. Fahmideh, M. S. K. Bhatia, I. Chami, F. Sala, C. Ré, Ask me anything:
Aktar, T. Mikkonen, Towards human-bot collabo- A simple strategy for prompting language models,
rative software architecting with chatgpt, in: Pro- arXiv preprint arXiv:2210.02441 (2022).
ceedings of the 27th International Conference on [22] U. Lee, H. Jung, Y. Jeon, Y. Sohn, W. Hwang, J. Moon,
Evaluation and Assessment in Software Engineer- H. Kim, Few-shot is enough: exploring chatgpt
ing, 2023, pp. 279–285. prompt engineering method for automatic question
[12] J. T. Liang, C. Yang, B. A. Myers, Understanding generation in english education, Education and
the usability of ai programming assistants, arXiv Information Technologies (2023) 1–33.
preprint arXiv:2303.17125 (2023). [23] S. Jalil, S. Rafi, T. D. LaToza, K. Moran, W. Lam,
[13] M. P. Robillard, A. Marcus, C. Treude, G. Bavota, Chatgpt and software testing education: Promises
O. Chaparro, N. Ernst, M. A. Gerosa, M. God- and perils, in: 2023 IEEE International Conference
frey, M. Lanza, M. Linares-Vásquez, G. C. Murphy, on Software Testing, Verification and Validation
L. Moreno, D. Shepherd, E. Wong, On-demand de- Workshops (ICSTW), 2023, pp. 4130–4137. doi:10.
veloper documentation, in: 2017 IEEE International 1109/ICSTW58534.2023.00078 .
Conference on Software Maintenance and Evo- [24] R. K. Yin, Case study research and applications,
lution (ICSME), 2017, pp. 479–483. doi:10.1109/ volume 6, Sage Thousand Oaks, CA, 2018.
ICSME.2017.17 . [25] C. Wohlin, P. Runeson, M. Höst, M. C. Ohlsson,
[14] E. Aghajani, C. Nagy, M. Linares-Vásquez, B. Regnell, A. Wesslén, Experimentation in software
L. Moreno, G. Bavota, M. Lanza, D. C. Shepherd, engineering, Springer Science & Business Media,
Software documentation: The practitioners’ 2012.
perspective, in: Proceedings of the ACM/IEEE [26] B. Bruegge, A. H. Dutoit, Object–oriented software
42nd International Conference on Software engineering. using uml, patterns, and java, Learn-
Engineering, ICSE ’20, Association for Computing ing 5 (2009) 7.
Machinery, New York, NY, USA, 2020, p. 590–601. [27] J. Cámara, J. Troya, L. Burgueño, A. Vallecillo, On
URL: https://doi.org/10.1145/3377811.3380405. the assessment of generative ai in modeling tasks:
doi:10.1145/3377811.3380405 . an experience report with chatgpt and uml., Softw
[15] P. Vaithilingam, T. Zhang, E. L. Glassman, Expecta- Syst Model 22 (2023) 781–793. doi:https://doi.
tion vs. experience: Evaluating the usability of code org/10.1007/s10270- 023- 01105- 5 .
generation tools powered by large language models,
in: Extended Abstracts of the 2022 CHI Conference
on Human Factors in Computing Systems, CHI EA
’22, Association for Computing Machinery, New
York, NY, USA, 2022. URL: https://doi.org/10.1145/
3491101.3519665. doi:10.1145/3491101.3519665 .