<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Large Language Models in Software Engineering: A Focus on Issue Report Classification and User Acceptance Test Generation</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Gabriele</forename><surname>De Vito</surname></persName>
							<email>gadevito@unisa.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli Studi di Salerno</orgName>
								<address>
									<settlement>Salerno</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luigi</forename><surname>Libero</surname></persName>
						</author>
						<author>
							<persName><forename type="first">Lucio</forename><surname>Starace</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Università degli Studi di Napoli Federico II</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sergio</forename><surname>Di</surname></persName>
							<affiliation key="aff1">
								<orgName type="institution">Università degli Studi di Napoli Federico II</orgName>
								<address>
									<settlement>Naples</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Filomena</forename><surname>Ferrucci</surname></persName>
							<email>fferrucci@unisa.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli Studi di Salerno</orgName>
								<address>
									<settlement>Salerno</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fabio</forename><surname>Palomba</surname></persName>
							<email>fpalomba@unina.it</email>
							<affiliation key="aff0">
								<orgName type="institution">Università degli Studi di Salerno</orgName>
								<address>
									<settlement>Salerno</settlement>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Large Language Models in Software Engineering: A Focus on Issue Report Classification and User Acceptance Test Generation</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">142CFC2A4C0AFE6E3B35667E75A56040</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:56+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large Language Models, Vector Databases, Issue Report Labeling, User Acceptance Test Generation, Software Engineering Orcid 0000-0002-1153-1566 (G. De Vito)</term>
					<term>0000-0001-7945-9014 (L. L. L. Starace)</term>
					<term>0000-0002-1019-9004 (S. Di Martino)</term>
					<term>0000-0002-0975-8972 (F. Ferrucci)</term>
					<term>0000-0001-9337-5116 (F. Palomba)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In recent years, Large Language Models (LLMs) have emerged as powerful tools capable of understanding and generating natural language text and source code with remarkable proficiency. Leveraging this capability, we are currently investigating the potential of LLMs to streamline software development processes by automating two key tasks: issue report classification and test scenario generation. For issue report classification the challenge lies in accurately categorizing and prioritizing incoming bug reports or feature requests. By employing LLMs, we aim to develop models that can efficiently classify issue reports, facilitating prompt response and resolution by software development teams. Test scenario generation involves the automatic generation of test cases to validate software functionality. In this context, LLMs offer the potential to analyze requirements documents, user stories, or other forms of textual input to automatically generate comprehensive test scenarios, reducing the manual effort required in test case creation. In this paper, we outline our research objectives, methodologies, and anticipated contributions to these topics in the field of software engineering. Through empirical studies and experimentation, we seek to assess the effectiveness and feasibility of integrating LLMs into existing software development workflows. By shedding light on the opportunities and challenges associated with LLMs in software engineering, this paper aims to pave the way for future advancements in this rapidly evolving domain.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In recent years, the field of software engineering has witnessed a paradigm shift with the emergence of Large Language Models (LLMs), such as OpenAI's GPT (Generative Pre-trained Transformer) series <ref type="bibr" target="#b0">[1]</ref> or LlaMA <ref type="bibr" target="#b1">[2]</ref>. These advanced Natural Language Processing (NLP) models have demonstrated remarkable capabilities in understanding and generating natural language text and source code, sparking widespread interest in their potential applications across various domains. Among these applications, the introduction of LLMs in software engineering holds significant promise for revolutionizing traditional practices and enhancing the efficiency of software development processes <ref type="bibr" target="#b2">[3]</ref>.</p><p>This paper aims to outline our ongoing research fo-cused on harnessing the power of LLMs for two key tasks in software engineering: issue report classification and test case generation. These tasks represent critical components of the software development lifecycle, with implications for both the quality of software products and the productivity of development teams. By exploiting the capabilities of LLMs, we seek to address challenges inherent in these tasks and explore opportunities for automation and optimization. Issue report classification is a fundamental aspect of software maintenance and bug tracking, involving the categorization and prioritization of incoming issue reports, such as bug reports or feature requests <ref type="bibr" target="#b3">[4]</ref>. Traditionally, this process has relied heavily on manual intervention, leading to bottlenecks in response time and resource allocation. Through our research, we aim to develop and evaluate LLM-based approaches for automating issue report classification, with the goal of improving the efficiency and accuracy of this critical task.</p><p>User Acceptance Test (UAT) generation is another area of focus in our research, where the objective is to automatically generate test cases that comprehensively validate software functionality. Manual creation of test cases can be time-consuming and error-prone, especially in complex software systems with numerous features and dependencies. By leveraging LLMs, we aim to explore methods for automatically generating test cases from tex-tual artifacts, such as requirements documents or user cases, thereby streamlining the testing process and reducing manual effort.</p><p>The remainder of this paper is structured as follows. In Section 2, we outline the research activities we are currently carrying out in the context of issue report labeling, while in Section 3, we focus on our research on automatic user acceptance test generation. Last, in Section 4, we give closing remarks and outline future works.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">LLMs for Issue Report Classification</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Problem Description</head><p>In collaborative Software Engineering, teams work together to develop and maintain software products. This collaboration involves various stakeholders, including developers, testers, project managers, and end-users, who contribute to different stages of the software development lifecycle. Throughout this process, issue reports play a crucial role in identifying, documenting, and addressing problems or requested changes within the software <ref type="bibr" target="#b4">[5]</ref>.</p><p>Issue reports, which are often managed by dedicated issue-tracking software <ref type="bibr" target="#b5">[6]</ref>, are formalized descriptions of change requests or issues encountered by stakeholders or identified during testing. These reports typically consist of natural language text written by stakeholders, possibly including details such as the nature of the problem, steps to reproduce it, expected and observed software behaviour, and any relevant screenshots, error messages, or logs. Issue reports serve as a key mean of communication between end-users or stakeholders and the development team, providing essential feedback on the functionality, usability, and performance of the software product.</p><p>Issue report classification is a fundamental aspect of software maintenance and bug tracking, involving the categorization and prioritization of incoming issue reports, such as bug reports, feature requests, or documentation-related inquiries <ref type="bibr" target="#b6">[7]</ref>. Misclassifying these reports can lead to misallocated resources, delayed bug fixes, and overall inefficiencies in the software development lifecycle. Relying exclusively on manual intervention for this classification task may lead to the introduction of bottlenecks in response time and resource allocation. Moreover, delegating the issue classification task to the stakeholders who submit the issue reports also often results in misclassified reports <ref type="bibr" target="#b7">[8,</ref><ref type="bibr" target="#b3">4]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">State of the art</head><p>Different approaches have been proposed in the literature to address these challenges. Antoniol et al. <ref type="bibr" target="#b8">[9]</ref> proposed using machine learning techniques-alternating decision trees, naive Bayes classifiers, and logistic regression-to automatically classify issues in bug tracking systems as either bugs (corrective maintenance) or non-bugs (other activities). The technique achieves classification accuracy between 77% and 82%, highlighting the potential for automated issue routing. However, the proposed approach is limited by its focus on three open-source systems and the manual classification process for creating the training dataset. With the same aim, Zhou et al. <ref type="bibr" target="#b9">[10]</ref> proposed an approach that combines text mining and data mining techniques to identify corrective bug reports in software systems, aiming to reduce misclassification noise and enhance bug prediction accuracy. Empirical studies on ten large open-source projects demonstrated its effectiveness over baseline methods and individual classifiers. Nevertheless, the approach's generalizability to commercial projects and dependence on manual training data classification still need improvement. Kallis et al. <ref type="bibr" target="#b4">[5]</ref> proposed introducing Ticket Tagger. This GitHub app automates the issue labeling process using a machine-learning model, specifically fastText, for classifying issues such as bug reports, enhancements, or questions based on their titles and descriptions. The evaluation on a dataset of 30,000 GitHub issues demonstrated high precision and recall across categories. However, it faced challenges with false positives in questions and false negatives in enhancements, indicating room for improvement in handling diverse linguistic patterns in issue descriptions.</p><p>LLMs have also proven effective for the issue report classification problem <ref type="bibr" target="#b10">[11,</ref><ref type="bibr" target="#b11">12,</ref><ref type="bibr" target="#b12">13]</ref>.Nonetheless, Colavito et al. observed that the performance of these models is influenced by inconsistent and noisy labels, standard in crowd-sourced datasets <ref type="bibr" target="#b11">[12,</ref><ref type="bibr" target="#b13">14]</ref>. They proposed leveraging GPT-like Large Language Models (LLMs) for automating issue labelling in software projects, demonstrating that these models can achieve performance comparable to state-of-the-art BERT-like models without fine-tuning. However, their experiment's scope is limited, relying on a small, manually verified subset of 400 GitHub issues extracted from the well-known nlbse dataset <ref type="bibr" target="#b14">[15]</ref>, which contains more than 1.4M issues. This may affect the generalizability of the findings across more extensive and diverse datasets. Furthermore, a risk of misclassification can stem from the approach employed to deal with issues that are too long to fit within the LLM context-size limit. Indeed, the proposed approach simply truncates the reports, thus causing a loss of possible precious information.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Proposed Approach</head><p>The approach we are currently investigating for issue report classification is based on leveraging LLMs with a dynamic few-shot prompting strategy, with the intro- duction of a more advanced summarization method to manage issues that are too long to fit within the context of the LLM, and the targeted or directed selection of few-shot examples, achieved using Vector Databases. An overview of our approach is presented in Figure <ref type="figure" target="#fig_0">1</ref> and described as follows.</p><p>In Phase 1, we deal with issues that are too long to fit within the LLM context. In such cases, we employ the MapReduce programming model to summarize and parallel refine relevant data efficiently. More in detail, we partition the large issue report into smaller, manageable text chunks. Each chunk is then processed in parallel and summarized by a LLM. The result for each chunk is then combined to obtain the final, summarized report.</p><p>In Phase 2, our approach aims at selecting, as few-shot examples, issue reports that are more "relevant" w.r.t. the one that is currently being classified. To this end, we leverage a vector database such as Milvus 1 ), in which previously-labelled issue reports are stored as vector representations. These vector representations are capable of capturing the semantic meaning and context of the issue reports in a high-dimensional space, and a similar vector-based representation of issues has also been used in prior works on issue report labelling <ref type="bibr" target="#b4">[5,</ref><ref type="bibr" target="#b6">7]</ref>. We then perform a similarity search between the vector representation of the current issue report to be labelled and those of previously-labelled issue reports in the vector database. This helps us identify few-shot examples that are more relevant and share common characteristics with the current issue report. Once the examples have been identified, we craft a few-shot prompt using state-of-the-art prompt engineering strategies <ref type="bibr" target="#b15">[16]</ref>, and then we present the prompt to the LLM for classification (see Phase 3 in Figure <ref type="figure" target="#fig_0">1</ref>). We envision that providing the right number of relevant examples and additional context to the LLMs will further enhance their promising issue report labelling capabilities. 1 Milvus. https://milvus.io/community</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Assessment Strategy</head><p>To assess the effectiveness of our LLM-based approach for issue report classification, we propose an empirical evaluation strategy leveraging state-of-the-art LLMs such as OpenAI's GPT-4 <ref type="bibr" target="#b16">[17]</ref>, focusing on accuracy, precision, recall, and F1-score. The strategy utilizes the "nlbse 2023" dataset <ref type="bibr" target="#b14">[15]</ref>, which will be indexed into a vector database to facilitate the extraction of vector representations for selecting relevant few-shot examples for the LLM. This approach avoids fine-tuning the LLM, aiming to leverage its pre-trained capabilities to classify issue reports accurately. The assessment will compare the performance of the LLM-based method against a test set provided in the "nlbse 2023" dataset, serving as a gold standard. This comparison will focus on the metrics reported above to comprehensively evaluate the LLM's effectiveness in classifying issue reports. Classification performance will be measured using the F1-score over all four classes (microaveraged), namely bug, feature, question, and documentation. The process involves experimenting with different numbers of few-shot examples, as well as investigating different vector representations and similarity functions to use when retrieving the few-shot examples, to identify the configuration that yields the highest performance across these metrics. By conducting this evaluation, we aim to demonstrate the potential of LLMs, like GPT-4, in automating the classification of issue reports, thereby offering a scalable and efficient alternative to manual classification methods in software development workflows.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">LLMs for User Acceptance Test Generation</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Problem Description</head><p>In software development, the generation of UATs represents a critical phase within the software testing life-cycle <ref type="bibr" target="#b17">[18]</ref>. UATs are designed to ensure that software systems meet the specified requirements and work for the end-user as intended before the software is released. Traditionally, creating UATs involves translating user requirements and use cases into testable scenarios, requiring significant manual effort and domain expertise. This manual approach to generating UATs is time-consuming and prone to human error, potentially leading to gaps in test coverage or misinterpretation of requirements <ref type="bibr" target="#b17">[18]</ref>. LLMs offer a promising avenue for automating the generation of UATs from natural language descriptions of software requirements or use cases. LLMs have demonstrated remarkable capabilities in understanding and generating natural language text, suggesting their potential utility in interpreting software requirements and automatically producing corresponding UATs <ref type="bibr" target="#b18">[19,</ref><ref type="bibr" target="#b19">20]</ref>. However, the application of LLMs in this context is challenging. The inherent ambiguity and variability of natural language and the complexity of software requirements pose significant obstacles to the accurate and reliable generation of UATs. Furthermore, the non-deterministic nature of LLM outputs and the limitations related to context size and model interpretability necessitate careful consideration and adaptation of these models for UAT generation <ref type="bibr" target="#b19">[20]</ref>. The challenge lies in leveraging LLMs to convert natural language software requirements into structured UATs, requiring adapting LLMs for accurate interpretation and ensuring the UATs are comprehensive and aligned with software functionality. Overcoming these hurdles can streamline testing, boost efficiency, reduce manual effort, and improve software quality.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">State of the art</head><p>Several studies have explored NLP for automating test case generation, often within specific domains or formats. Nebut et al. <ref type="bibr" target="#b20">[21]</ref> automate system test case generation using UML and contracts, facing challenges with manual intensity and scalability in complex systems. Carvalho et al. <ref type="bibr" target="#b21">[22]</ref> create NAT2TEST for generating test cases from Controlled Natural Language, noting reduced efficiency due to formal model reliance. Yue et al. <ref type="bibr" target="#b22">[23]</ref> develop RTCM for converting natural language test cases into executable tests but lack comprehensive performance analysis and generalizability. Goffi et al. <ref type="bibr" target="#b23">[24]</ref> introduce Toradocu, using Javadoc comments for test oracle generation, yet it remains a prototype with limitations in processing complex conditions. Silva et al. <ref type="bibr" target="#b24">[25]</ref> offer a test case generation strategy using Colored Petri Nets but do not address requirement completeness and consistency, risking state explosion issues. Allala et al. <ref type="bibr" target="#b25">[26]</ref> propose a method integrating MDE with NLP for converting user requirements into test cases, still in its initial phase and validated on a small sample. Fischbach et al. <ref type="bibr" target="#b26">[27]</ref> explore test case automation from agile acceptance criteria, finding natural language complexity a barrier to full automation. Wang et al. <ref type="bibr" target="#b27">[28]</ref> develop UMTG for system-level test case creation using natural language and domain models tailored for embedded systems and facing scalability challenges.</p><p>Despite the promising results, many limitations persist across the board. These limitations primarily revolve around the scalability of the approaches in complex systems, the efficiency of the processes, and the generalizability of the tools and methods to different domains or types of software systems. These limitations underscore the need for further research to integrate natural language requirements more seamlessly into the test generation process.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Proposed Approach</head><p>Our approach to automating UAT generation involves analyzing requirements expressed through use cases, specified using natural language. It consists of two primary phases: 1) Identifying the list of test cases from a use case, and 2) Elaborating the details of each test case. Throughout this process, we employ LLMs, particularly GPT-4 <ref type="bibr" target="#b16">[17]</ref>, as a tool to interpret and translate the use cases into comprehensive UAT documentation.</p><p>The initial phase tackles LLMs' context limits and nondeterminism. Indeed, long textual descriptions of use cases in inputs exceeding the context limit could result in incomplete responses. At the same time, the model's nondeterminism might produce inconsistent results, risking the generation of irrelevant test cases. To mitigate these challenges, we designed the prompt by leveraging the few-shot learning technique and providing precise and clear instructions for the LLM. The outcome of the identification phase is a list of test cases structured in JSON format derived from the provided text description of the use case. Each test case includes a unique identifier, a clear and concise description, the flow type, an indicator of the need for a separate UAT may not be necessary, and explicit presence in the original use case.</p><p>The second phase focuses on generating the details of the identified UATs. The goal is to produce a test case aligned with the use case scenario it refers to and sufficiently detailed to guide the test's execution without ambiguity. The details of each test case are structured in a JSON format that facilitates understanding and implementation of the tests, containing information such as preconditions, actors, and steps, including inputs and expected results. Since each test case is independent from the others, multiple requests can be processed in parallel, significantly reducing the overall execution times and optimizing efficiency and speed of execution.</p><p>To mitigate the LLM's non-determinism, we operated in multiple directions. On one hand, we focused on configuring GPT-4's hyperparameters ef- In preliminary experiments, we found that setting the temperature, presence_penalty, and frequency_penalty hyperparameters to 0, the best_of hyperparameter to 1, and the top_p hyperparameter to 1, as recommended by OpenAI, yielded the most deterministic outcomes.</p><p>On the other hand, to ensure GPT-4 generates specific and relevant outputs, prompts were meticulously crafted with clear, detailed instructions and examples of desired outputs, adopting a "show, do not tell" strategy <ref type="bibr" target="#b15">[16]</ref>. This method helps the model grasp the expected format and content more accurately. Prompts and configurations underwent iterative refinements based on feedback to enhance result consistency. Finally, outputs were rigorously evaluated for consistency and requirement adherence, allowing for adjustments in response to identified non-determinism patterns.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.4.">Assessment Strategy</head><p>To evaluate the approach we will design and carry out an empirical experiment involving software engineering professionals. These participants will be divided into two groups: one utilizing our automated approach and the other resorting to manual methods for UAT generation. This design allows for a direct comparison of the outcomes, providing valuable insights into the effectiveness of the approach. By ensuring the completeness, clarity, understandability, and correctness of the generated UATs, we aim to streamline the process, enhance test coverage, and ultimately contribute to the development of higher-quality software products. Feedback from the participants will also be collected to gain insights into the usability and practicality of the approach in real-world software development scenarios. This feedback will be invaluable in refining the method and identifying areas for further research and development.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusions</head><p>In this paper, we discuss the potential of leveraging LLMs to address two significant challenges in software engineering: issue report classification and UAT generation. By employing advanced techniques such as vector databases and few-shot learning with LLMs, we aim to enhance the efficiency and accuracy of these essential tasks. We envision that our approaches could significantly improve upon current manual and automated methods, though challenges related to natural language ambiguities and model determinism remain. Moving forward, we will focus on refining our methodologies and expanding LLM applications within software engineering to streamline development workflows and elevate software quality. Our work indicates a bright future for integrating LLMs in the field, promising substantial efficiency and product excellence advancements.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Issue Report Classification Process.</figDesc><graphic coords="3,110.13,84.19,375.01,131.43" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: UAT Generation Process.</figDesc><graphic coords="5,110.13,84.19,375.02,132.52" type="bitmap" /></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was partially funded by the NextGenerationEu-PNRR MUR Project FAIR (Future Artificial Intelligence Research), grant ID PE0000013.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Achiam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Adler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ahmad</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Akkaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">L</forename><surname>Aleman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Almeida</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Altenschmidt</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Altman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Anadkat</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<title level="m">Gpt-4 technical report</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">H</forename><surname>Touvron</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Martin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Stone</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2307.09288</idno>
		<title level="m">Llama 2: Open foundation and fine-tuned chat models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Application of large language models to software engineering tasks: Opportunities, risks, and implications</title>
		<author>
			<persName><forename type="first">I</forename><surname>Ozkaya</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Software</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="page" from="4" to="8" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lanubile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Quaranta</surname></persName>
		</author>
		<title level="m">Leveraging GPT-like LLMs to automate issue labeling</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Ticket tagger: Machine learning driven issue classification</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), IEEE</title>
				<meeting>of the IEEE Int. Conf. on Software Maintenance and Evolution (ICSME), IEEE</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="406" to="409" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Situational awareness: personalizing issue tracking systems</title>
		<author>
			<persName><forename type="first">O</forename><surname>Baysal</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">35th Intern. Conf. on Software Engineering (ICSE), IEEE</title>
				<imprint>
			<date type="published" when="2013">2013. 2013</date>
			<biblScope unit="page" from="1185" to="1188" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Predicting issue types on github</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Science of Computer Programming</title>
		<imprint>
			<biblScope unit="volume">205</biblScope>
			<biblScope unit="page">102598</biblScope>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">It&apos;s not a bug, it&apos;s a feature: how misclassification impacts bug prediction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Herzig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2013 35th intern. conf. on software engineering</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="392" to="401" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Is it a bug or an enhancement? a text-based approach to classify change requests</title>
		<author>
			<persName><forename type="first">G</forename><surname>Antoniol</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research</title>
				<meeting>of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research</meeting>
		<imprint>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="304" to="318" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Combining text mining and data mining for bug report classification</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Software: Evolution and Process</title>
		<imprint>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="150" to="176" />
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Issue-labeler: an albert-based jira plugin for issue classification</title>
		<author>
			<persName><forename type="first">W</forename><surname>Alhindi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 10th Intern. Conf. on Mobile Software Engineering and Systems (MOBILESoft), IEEE</title>
				<imprint>
			<date type="published" when="2023">2023. 2023</date>
			<biblScope unit="page" from="40" to="43" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Issue report classification using pre-trained language models</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 1st Int. Workshop on Nat. Lang.-based Softw. Eng</title>
				<meeting>1st Int. Workshop on Nat. Lang.-based Softw. Eng</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="29" to="32" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Predicting the objective and priority of issue reports in software repositories</title>
		<author>
			<persName><forename type="first">M</forename><surname>Izadi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Empirical Software Engineering</title>
		<imprint>
			<biblScope unit="volume">27</biblScope>
			<biblScope unit="page">50</biblScope>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Few-shot learning for issue report classification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2023 IEEE/ACM 2nd Int. Workshop on NLBSE, IEEE</title>
				<meeting>of the 2023 IEEE/ACM 2nd Int. Workshop on NLBSE, IEEE</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="16" to="19" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">The nlbse&apos;23 tool competition</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of The 2nd Intern. Workshop on Natural Language-based Software Engineering (NLBSE&apos;23)</title>
				<meeting>The 2nd Intern. Workshop on Natural Language-based Software Engineering (NLBSE&apos;23)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Prompt Engineering For ChatGPT: A Quick Guide To Techniques</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ekin</surname></persName>
		</author>
		<idno type="DOI">10.36227/techrxiv.22683919.v1</idno>
	</analytic>
	<monogr>
		<title level="j">Tips, And Best Practices</title>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<title/>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.08774</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">Gpt-4 technical report</note>
</biblStruct>

<biblStruct xml:id="b17">
	<analytic>
		<title level="a" type="main">Object-oriented software engineering. using uml, patterns, and java</title>
		<author>
			<persName><forename type="first">B</forename><surname>Bruegge</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">H</forename><surname>Dutoit</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Learning</title>
		<imprint>
			<biblScope unit="volume">5</biblScope>
			<biblScope unit="page">442</biblScope>
			<date type="published" when="2009">2009</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Chatgpt for good? on opportunities and challenges of large language models for education, Learning and Individual Dif</title>
		<author>
			<persName><forename type="first">E</forename><surname>Kasneci</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Sessler</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ferences</title>
		<imprint>
			<biblScope unit="volume">103</biblScope>
			<biblScope unit="page">102274</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">X</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Tang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2303.18223</idno>
		<title level="m">A survey of large language models</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<title level="a" type="main">Automatic test generation: a use case driven approach</title>
		<author>
			<persName><forename type="first">C</forename><surname>Nebut</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Fleurey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Le Traon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J.-M</forename><surname>Jezequel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Software Engineering</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="page" from="140" to="155" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<analytic>
		<title level="a" type="main">Nat2test tool: From natural language requirements to test cases based on csp</title>
		<author>
			<persName><forename type="first">G</forename><surname>Carvalho</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Software Engineering and Formal Methods</title>
				<editor>
			<persName><forename type="first">R</forename><surname>Calinescu</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Rumpe</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="283" to="290" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Rtcm: A natural language based, automated, and practical test case generation framework</title>
		<author>
			<persName><forename type="first">T</forename><surname>Yue</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2015 International Symposium on Software Testing and Analysis</title>
				<meeting>the 2015 International Symposium on Software Testing and Analysis</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2015">2015</date>
			<biblScope unit="page" from="397" to="408" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<analytic>
		<title level="a" type="main">Automatic generation of oracles for exceptional behaviors</title>
		<author>
			<persName><forename type="first">A</forename><surname>Goffi</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 25th Intern. Symposium on Software Testing and Analysis</title>
				<meeting>the 25th Intern. Symposium on Software Testing and Analysis</meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="213" to="224" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Test case generation from natural language requirements using cpn simulation</title>
		<author>
			<persName><forename type="first">B</forename><forename type="middle">C F</forename><surname>Silva</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Formal Methods: Foundations and Applications</title>
				<editor>
			<persName><forename type="first">M</forename><surname>Cornélio</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">B</forename><surname>Roscoe</surname></persName>
		</editor>
		<meeting><address><addrLine>Cham</addrLine></address></meeting>
		<imprint>
			<publisher>Springer International Publishing</publisher>
			<date type="published" when="2016">2016</date>
			<biblScope unit="page" from="178" to="193" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Towards transforming user requirements to test cases using mde and nlp</title>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">C</forename><surname>Allala</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 43rd Annual Computer Software and Applications Conference</title>
				<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="350" to="355" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b26">
	<analytic>
		<title level="a" type="main">Specmate: Automated creation of test cases from acceptance criteria</title>
		<author>
			<persName><forename type="first">J</forename><surname>Fischbach</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE 13th Int. Conf. on Software Testing, Validation and Verification</title>
				<imprint>
			<date type="published" when="2020">2020</date>
			<biblScope unit="page" from="321" to="331" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b27">
	<analytic>
		<title level="a" type="main">Automatic generation of acceptance test cases from use case specifications: An nlp-based approach</title>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Software Engineering</title>
		<imprint>
			<biblScope unit="volume">48</biblScope>
			<biblScope unit="page" from="585" to="616" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
