<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Large Language Models for Issue Report Classification</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Giuseppe</forename><surname>Colavito</surname></persName>
							<email>giuseppe.colavito@uniba.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bari &quot;Aldo Moro&quot;</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Filippo</forename><surname>Lanubile</surname></persName>
							<email>filippo.lanubile@uniba.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bari &quot;Aldo Moro&quot;</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Nicole</forename><surname>Novielli</surname></persName>
							<email>nicole.novielli@uniba.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bari &quot;Aldo Moro&quot;</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Luigi</forename><surname>Quaranta</surname></persName>
							<email>luigi.quaranta@uniba.it</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Bari &quot;Aldo Moro&quot;</orgName>
								<address>
									<country key="IT">Italy</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Large Language Models for Issue Report Classification</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">04AA1DFF1985DAE2BD3069603106C02D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:55+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Issue classification, Large Language Models, Generative AI, Software Maintenance and Evolution, Few-Shot Learning (L. Quaranta) 0000-0003-3871-401X (G. Colavito)</term>
					<term>0000-0003-3373-7589 (F. Lanubile)</term>
					<term>0000-0003-1160-2608 (N. Novielli)</term>
					<term>0000-0002-9221-0739 (L. Quaranta)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Effective issue classification is crucial for efficient software project management. However, labels assigned to issues are often inconsistent, which can negatively impact the performance of supervised classification models. In this work, we investigate how label consistency and training data size affect automatic issue classification. We first evaluate a few-shot learning approach on a manually validated dataset and compare it to fine-tuning on a larger crowd-sourced set. The results show that our approach achieves higher accuracy when trained and tested on consistent labels. We then examine zero-shot classification using GPT-3.5, finding that its performance is comparable to supervised models despite having no fine-tuning. This suggests that generative models can help classify issues when annotated data is limited. Overall, our findings provide insights into balancing data quantity and quality for issue classification.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Collaborative software development involves complex processes and activities to effectively support software development and maintenance. In this context, issuetracking systems are widely adopted to manage requests for changes -such as bug fixes or product enhancements, as well as requests for support from users -and are regarded as essential tools for maintainers to efficiently manage software evolution activities.</p><p>Issue reports organized in such systems typically contain information such as an identifier, a description, the author, the issue status (e.g., open, assigned, closed), a comment thread, and a label indicating the type of issue, such as bug, enhancement, or support. Effective labeling of issue reports is of paramount importance to support prioritization and decision-making. Unfortunately, however, label misuse is a common problem, as submitters often confuse improvement requests as bugs and vice versa <ref type="bibr" target="#b0">[1]</ref>. For example, Herzig et.al <ref type="bibr" target="#b1">[2]</ref> reported that approximately 33.8% of all issue reports are incorrectly labeled. To avoid dealing with incorrect labels, automated classification methods have been proposed. Automatic issue classification can enable effective issue management and prioritization <ref type="bibr" target="#b2">[3]</ref>, without the need to instruct developers on how to assign labels correctly.</p><p>Early research on this topic proposed exploiting supervised methods that leverage text-based features for the task of automatic issue report classification <ref type="bibr" target="#b0">[1]</ref>. More recently, approaches leveraging word embeddings have emerged <ref type="bibr" target="#b3">[4,</ref><ref type="bibr" target="#b4">5,</ref><ref type="bibr" target="#b5">6,</ref><ref type="bibr" target="#b6">7]</ref>. In particular, approaches based on BERT <ref type="bibr" target="#b7">[8]</ref> and its variants achieved state-of-the-art performance <ref type="bibr" target="#b8">[9,</ref><ref type="bibr" target="#b9">10,</ref><ref type="bibr" target="#b10">11]</ref>.</p><p>In our previous work, we conducted an empirical study to investigate to what extent we can leverage pre-trained language models for automatic issue labeling <ref type="bibr" target="#b9">[10]</ref>. We experimented with a dataset of more than 800K issue reports from GitHub open-source software projects labeled by project contributors as bug, enhancement, or question <ref type="bibr" target="#b8">[9]</ref>. We fine-tuned the BERT <ref type="bibr" target="#b7">[8]</ref> variant RoBERTa <ref type="bibr" target="#b11">[12]</ref>, achieving state-of-the-art performance (F1 = 0.8591).</p><p>Our manual error analysis revealed that the main cause of the misclassification of issues is label inconsistency across different projects. Also, several issue reports in the dataset were tagged with more than one label, which is indeed a source of noise. This evidence is in line with previous studies reporting the impact of data quality on the performance of machine learning models <ref type="bibr" target="#b12">[13]</ref>. Informed by the results of our error analysis and by findings of previous research, we formulate the following research question:</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>RQ1: To what extent does label consistency impact the performance of supervised issue classification models?</head><p>To address it, we investigate the efficacy of few-shot learning for training robust classifiers using a small training dataset with manually validated labels. Specifically, we experiment with SETFIT, an effective methodology for fine-tuning of transformer-based models using few-shot learning <ref type="bibr" target="#b13">[14]</ref>, achieving promising results <ref type="bibr" target="#b14">[15]</ref>.</p><p>Still, manual annotation can be a costly task, both in terms of time and resources, even if done on a small set of manually curated examples. Hence, the need for minimizing the effort associated with data labeling re-mains. With the advent of recent GPT-like Large Language Models (LLMs), researchers have started investigating their potential in solving software engineering challenges <ref type="bibr" target="#b15">[16,</ref><ref type="bibr" target="#b16">17]</ref>. To better understand how GPT-like LLMs can be leveraged in automated issue labeling in the absence of training data, we formulate and investigate our second research question as follows:</p><p>RQ2: To what extent we can leverage GPT-like LLMs to classify issue reports?</p><p>To address it, we evaluate GPT3.5-turbo <ref type="bibr" target="#b17">[18]</ref> in a zeroshot learning scenario, where the model is prompted by only providing the task and label descriptions. We compare the performance of classifiers based on GPT-like LLMs with fine-tuned BERT-like LLMs <ref type="bibr" target="#b18">[19]</ref>.</p><p>In this paper, we discuss our ongoing work on using LLMs to address software engineering challenges, with a particular focus on the automatic classification of issue reports in a low-resource setting. Specifically, we summarize the findings of two recent studies in which we addressed the research questions formulated above <ref type="bibr" target="#b14">[15,</ref><ref type="bibr" target="#b18">19]</ref>. The remainder of the paper is organized as follows. In Sections 2 and 3, we describe the datasets and methodology adopted in our empirical studies, respectively. Then, we report and discuss the study results in Section 4. The paper is concluded in Section 5, where we also outline directions for future work.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Dataset</head><p>To address our research questions, we use a dataset of 400 GitHub issues labeled as bug, features, question, and documentation. The dataset is split into two subsets of 200 issues which we use as train and test sets, respectively. Both subsets are equally distributed and include 50 issues per class. Our dataset is obtained by manually labeling the 400 randomly selected items from the dataset of 1.4M GitHub issues distributed by the NLBSE'23 tool competition organizers <ref type="bibr" target="#b19">[20]</ref>. To manually ensure the consistency of labels in our dataset, three annotators individually categorized each issue report based on the information in its title and body. Each issue report was assigned to two of the annotators. We observed a Cohen's 𝜅 of 0.74, which indicates a substantial level of interrater agreement <ref type="bibr" target="#b20">[21]</ref>. The annotators had a joint plenary meeting to discuss and resolve the cases of disagreement. Through this procedure, we ensured the reliability and consistency of the annotations. Table <ref type="table" target="#tab_0">1</ref> presents the dataset's distribution before and after the manual labeling. The manually annotated sample is publicly available <ref type="bibr" target="#b21">[22]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Methodology</head><p>To address our first research question, we investigate the efficacy of few-shot learning for training robust classi- fiers using the small manually validated training dataset described in Section 2. In particular, we train and evaluate a model based on SETFIT <ref type="bibr" target="#b13">[14]</ref> using the manually labeled train and test sets. Then we compare its performance with the one obtained by fine-tuning RoBERTa <ref type="bibr" target="#b14">[15]</ref> using the full dataset of 1.4M crowd-annotated issues <ref type="bibr" target="#b19">[20]</ref>.</p><p>To address our second research question, we compare the performance of the SETFIT classifier with the performance achieved by GPT 3.5 in a zero-shot learning scenario. We highlight that prompting is only used for GPT while the SETFIT model is trained on the manually labeled data. Both models are evaluated on the test set partition of manually labeled issues.</p><p>Preprocessing For our SETFIT model, we preprocess our dataset as follows. First, non-textual items, such as links, code snippets, and images, are identified and replaced with tokens (e.g., &lt;link&gt; for links) in the dataset. Next, we use the ekphrasis Text Pre-Processor<ref type="foot" target="#foot_0">1</ref> to normalize the text by detecting and replacing items such as URLs, email addresses, symbols, phone numbers, mentions, time, date, and numbers with specific tokens.</p><p>Choice of GPT-like models Several LLMs have been proposed in the last few years, with GPT-3 <ref type="bibr" target="#b22">[23]</ref> being one of the most popular. There is a significant prevalence of studies leveraging GPT3.5-turbo <ref type="bibr" target="#b23">[24]</ref>, an instructiontuned version of GPT-3, which is able to interact as a chatbot. For this reason, we select GPT3.5-turbo <ref type="bibr" target="#b17">[18]</ref> as representative of GPT-like LLMs. We experiment with several versions of GPT3.5-turbo, with varying context length and date of training. Here we only report the results of the model with the best performance. More details can be found in our original work describing this study <ref type="bibr" target="#b18">[19]</ref>.</p><p>Prompting To instruct the model to perform the classification task, we create a prompt that includes the following items:</p><p>• Input Format: The format of the input issues, which includes a title and a body;</p><p>• Task Description: A description of the classification task to be performed, including the possible labels that can be assigned to the issues; • Label Descriptions: A brief description of each label. Label descriptions are generated by ChatGPT and then manually reviewed to ensure they are clear and informative. • Input Issue: The issue to be classified;</p><p>• Output format instructions: The desired output format. We ask the model for a JSON object containing a reasoning and the predicted label. This is done to inject some Chain-of-Thought reasoning into the model, as suggested in previous studies about prompting LLMs <ref type="bibr" target="#b24">[25,</ref><ref type="bibr" target="#b25">26]</ref>. However, the reasoning serves as a prompt-engineering strategy and is not used to evaluate the model.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Evaluation</head><p>In line with previous work <ref type="bibr" target="#b5">[6,</ref><ref type="bibr" target="#b6">7,</ref><ref type="bibr" target="#b10">11,</ref><ref type="bibr" target="#b9">10]</ref>, the evaluation of the classifiers on the test set is provided in terms of precision, recall, and f1-measure <ref type="bibr" target="#b14">[15]</ref>. For GPT-like LLMs, we parse the JSON response and extract the predicted label. In cases in which the label is not valid or the model did not follow the instructions appropriately, we discard the prediction. This process is done with the use of regular expressions. Both the models are tested on the manually verified test set <ref type="bibr" target="#b18">[19]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and Discussion</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Impact of label consistency on the classifier performance (RQ1)</head><p>In Table <ref type="table" target="#tab_1">2</ref>, we present the results obtained by training the SETFIT classifier on the hand-labeled gold standard and evaluating it on both the hand-labeled test set (a) and the full test set distributed for the challenge (c). To ensure a fair comparison, we compared the SETFIT model's performance with the performance obtained by RoBERTa on the same test set, when trained on the hand-labeled gold standard set (b1). Furthermore, we also include the performance obtained by training the RoBERTa classifier on the full train set distributed by the organizers (b2).</p><p>To assess the ability of the models to generalize on a broader dataset, we also include a comparison with the NLBSE '23 challenge baseline <ref type="bibr" target="#b19">[20]</ref> (see row (d) of the table) and the SETFIT model's performance on the challenge full test set (see model (c) in the table). It is worth noting that the SETFIT model is designed to learn from a few examples. As such, it was not possible to train it on the raw dataset, since it is not optimized for such a setting and it would have been heavily time expensive. Instead, the RoBERTa baseline is trained on the full set.</p><p>The SETFIT model achieved an F1-micro score of .7767 (see model (c) in Table <ref type="table" target="#tab_1">2</ref>) when trained on the manually la-beled gold standard and tested on the raw test set. When trained and evaluated on the manually labeled dataset (a), SETFIT performs better than RoBERTa (b1 and b2), regardless of whether the training set used for RoBERTa is raw or manually labeled. However, when trained on the manually-labeled dataset (b1), RoBERTa struggles to deliver good performance due to a shortage of training data. On the other hand, when trained on the raw dataset (b2), RoBERTa achieves competitive performances, but it is unable to outperform SETFIT (b).</p><p>As the manually-labeled dataset embodies the ideal labeling criteria for classifiers, comparing SETFIT (a) and RoBERTa (b2) provides a practical scenario in which we must choose either training a classifier on a large volume of data with disregard for data quality or concentrating on a smaller portion of data and manually improving label quality. This comparison suggests that data quality might be crucial for ensuring classification accuracy. A potential approach could be to start with a few-shot classifier and gradually switch to a more powerful model like RoBERTa when a fair amount of manually verified data becomes available. By doing so, we can strike a balance between data quantity and quality, ensuring that the classifier performs effectively while minimizing the possibility of inaccurate results caused by inconsistency in the labeling.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.2.">Leveraging GPT for automatic issue report classification (RQ2)</head><p>In Table <ref type="table" target="#tab_2">3</ref>, we report the classification performance of GPT compared to the SETFIT model. As already explained in the previous section, we experimented with several versions of GPT 3.5 that were available at the time of the study. For a full report of the results, see Colavito et al. <ref type="bibr" target="#b18">[19]</ref>. In this paper, we include consideration of the 16k-0613 model only as this achieves the best performance in terms of a combination of F1 and percentage of discarded items due to nonsensical model output. Specifically, none of the predictions from this model were discarded. We observe that the Feature class achieves the best F1, while the Documentation class is the most problematic to identify, showing a lower recall than the other classes. While the zero-shot GPT model achieves a slightly lower performance (F1 = .8155) than SETFIT (F1 = .8321), the models are still comparable. It's worth noting that SETFIT was fine-tuned on a portion of the issue report gold standard dataset, while GPT was evaluated in a zeroshot setting without any task-specific fine-tuning. This implies that GPT is capable of classifying issue reports with only a minor decrease in accuracy compared to finetuned BERT-like models. This presents a major benefit of GPT for this application since it can perform the classification in absence of labeled data, i.e., without the need  Although this could be a viable solution for open-source projects, it is worth noting that the cost of API calls and the privacy of data could limit its practical feasibility in commercial projects. In such cases, project maintainers might consider using open-source models or building and deploying a classifier on-premise. Nonetheless, the construction and maintenance of LLMs is expensive both in terms of resources and time, and this constitutes a barrier to their adoption in most cases.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Future Works</head><p>In this paper, we summarized the outcomes of our recently published studies on the use of large language models for automated issue classification. Specifically, we investigated the impact of improving data quality on issue classification performance. We trained and evaluated a model based on few-shot learning using SET-FIT with a subset of manually verified data. The model achieves better performance when trained and tested on data for which label consistency was manually verified <ref type="bibr" target="#b21">[22]</ref>, compared to the RoBERTa baseline. However, RoBERTa generalizes better on the full test dataset when fine-tuned on the full crowd-sourced dataset. Furthermore, we explored the performance of GPTlike models for automatic issue classification <ref type="bibr" target="#b18">[19]</ref> to understand if we can leverage GPT-like LLMs to achieve state-of-the-art performance in the absence of manually annotated issues, i.e. when a gold standard is not available for fine-tuning state-of-the-art approaches based on BERT-like models. Our empirical results show that GPTlike models can achieve a performance comparable to the state-of-the-art without the need for fine-tuning. This suggests that when manual annotation is not feasible or a gold standard for training is not available (i.e., on a new project), maintainers could rely on generative AI to successfully address the issue classification task.</p><p>However, using LLMs to build issue classifiers might pose important challenges due to licensing and computational limitations. As such, we plan to extend this benchmark with open-source LLMs, also including issue-report datasets. This will enable evaluating the generalizability of our findings.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>//ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org)</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Distribution of labels in the extracted samples.</figDesc><table><row><cell>Label</cell><cell cols="2">Train set</cell><cell cols="2">Test set</cell></row><row><cell>Bug</cell><cell cols="2">47 24%</cell><cell cols="2">53 27%</cell></row><row><cell>Documentation</cell><cell cols="2">33 17%</cell><cell cols="2">32 16%</cell></row><row><cell>Feature</cell><cell cols="2">60 30%</cell><cell cols="2">55 28%</cell></row><row><cell>Question</cell><cell cols="2">44 22%</cell><cell cols="2">47 24%</cell></row><row><cell>Discarded</cell><cell>16</cell><cell>8%</cell><cell>13</cell><cell>7%</cell></row><row><cell>Total</cell><cell>200</cell><cell></cell><cell>200</cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2</head><label>2</label><figDesc>Performance of the SETFIT model and comparison with the RoBERTa baseline approach. The performance of the model submitted to the challenge is reported in Italic. In bold, we highlight the best performance obtained with SETFIT.</figDesc><table><row><cell></cell><cell>Model</cell><cell></cell><cell>Train</cell><cell></cell><cell>Test</cell><cell>F1</cell></row><row><cell>(a)</cell><cell>SETFIT</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>0.8321</cell></row><row><cell>(b1)</cell><cell>RoBERTa</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>0.4348</cell></row><row><cell>(b2)</cell><cell>RoBERTa</cell><cell>Full</cell><cell>GitHub labels</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>0.8182</cell></row><row><cell>(c)</cell><cell>SETFIT</cell><cell>Sampled</cell><cell>Manual labels</cell><cell>Full</cell><cell>GitHub labeling</cell><cell>0.7767</cell></row><row><cell>(d)</cell><cell>RoBERTa (baseline)</cell><cell>Full</cell><cell>GitHub labels</cell><cell>Full</cell><cell>GitHub labels</cell><cell>0.8890</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 3</head><label>3</label><figDesc>Comparison between SETFIT and GPT-3.5. This evidence could help maintainers of new projects, for which historical data is not available or is scarce. In such cases, API calls to GPT could be used to classify issue reports, providing a valuable tool for project management. Once the project has accumulated enough labeled data, the maintainer could switch to a fine-tuned model to improve the classification accuracy.</figDesc><table><row><cell></cell><cell></cell><cell>SETFIT</cell><cell></cell><cell cols="3">GPT-3.5 (16k-0613), zero-shot</cell></row><row><cell>Label</cell><cell cols="3">Precision Recall F1-Score</cell><cell cols="3">Precision Recall F1-Score</cell></row><row><cell>Bug</cell><cell>0.8723</cell><cell>0.8472</cell><cell>0.8590</cell><cell>0,7133</cell><cell>0,9811</cell><cell>0,8261</cell></row><row><cell>Documentation</cell><cell>0.9039</cell><cell>0.6594</cell><cell>0.7616</cell><cell>0,8853</cell><cell>0,6191</cell><cell>0,7285</cell></row><row><cell>Feature</cell><cell>0.7494</cell><cell>0.9182</cell><cell>0.8251</cell><cell>0,8861</cell><cell>0,8491</cell><cell>0,8672</cell></row><row><cell>Question</cell><cell>0.8754</cell><cell>0.8319</cell><cell>0.8528</cell><cell>0,8668</cell><cell>0,7719</cell><cell>0,8164</cell></row><row><cell>Overall</cell><cell>0.8321</cell><cell>0.8321</cell><cell>0.8321</cell><cell>0,8155</cell><cell>0,8155</cell><cell>0,8155</cell></row><row><cell>for fine-tuning.</cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell><cell></cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">https://github.com/cbaziotis/ekphrasis</note>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This research was co-funded by the NRRP Initiative, Mission 4, Component 2, Investment 1.3 -Partnerships extended to universities, research centres, companies and research D.D. MUR n. 341 del 15.03.2022 -Next Generation EU ("FAIR -Future Artificial Intelligence Research", code PE00000013, CUP H97G22000210007) and by the European Union -NextGenerationEU through the Italian Ministry of University and Research, Projects PRIN 2022 ("QualAI: Continuous Quality Improvement of AI-based Systems", grant n. 2022B3BP5S, CUP: H53D23003510006).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Is it a bug or an enhancement? a text-based approach to classify change requests</title>
		<author>
			<persName><forename type="first">G</forename><surname>Antoniol</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Ayari</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Di Penta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Khomh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y.-G</forename><surname>Guéhéneuc</surname></persName>
		</author>
		<idno type="DOI">10.1145/1463788.1463819</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, CASCON &apos;08</title>
				<meeting>of the 2008 Conf. of the Center for Advanced Studies on Collaborative Research: Meeting of Minds, CASCON &apos;08<address><addrLine>New York, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>ACM</publisher>
			<date type="published" when="2008">2008</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">It&apos;s not a bug, it&apos;s a feature: How misclassification impacts bug prediction</title>
		<author>
			<persName><forename type="first">K</forename><surname>Herzig</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Just</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Zeller</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICSE.2013.6606585</idno>
	</analytic>
	<monogr>
		<title level="m">2013 35th Int&apos;l Conf.on Software Engineering (ICSE)</title>
				<imprint>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Automated classification of software issue reports using machine learning techniques: an empirical study</title>
		<author>
			<persName><forename type="first">N</forename><surname>Pandey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sanyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Hudait</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Sen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11334-017-0294-1</idno>
	</analytic>
	<monogr>
		<title level="j">Innovations in Systems and Software Engineering</title>
		<imprint>
			<date type="published" when="2017">2017</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Neural word embedding as implicit matrix factorization</title>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Goldberg</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">Z</forename><surname>Ghahramani</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Welling</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><surname>Lawrence</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><forename type="middle">Q</forename><surname>Weinberger</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Assoc., Inc</publisher>
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Distributed representations of words and phrases and their compositionality</title>
		<author>
			<persName><forename type="first">T</forename><surname>Mikolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Corrado</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Dean</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 26th Int&apos;l Conf.on Neural Inf. Proc. Systems -Volume 2, NIPS&apos;13</title>
				<meeting>of the 26th Int&apos;l Conf.on Neural Inf. . Systems -Volume 2, NIPS&apos;13<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Predicting issue types on github</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Sorbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Canfora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Panichella</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.scico.2020.102598</idno>
		<idno>doi:</idno>
		<ptr target="https://doi.org/10.1016/j.scico.2020.102598" />
	</analytic>
	<monogr>
		<title level="j">Science of Computer Programming</title>
		<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Ticket tagger: Machine learning driven issue classification</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Sorbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Canfora</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Panichella</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICSME.2019.00070</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE Int&apos;l. Conf on Software Maintenance and Evolution (ICSME)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2019">2019. 2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">BERT: Pre-training of deep bidirectional transformers for language understanding</title>
		<author>
			<persName><forename type="first">J</forename><surname>Devlin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-W</forename><surname>Chang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Toutanova</surname></persName>
		</author>
		<idno type="DOI">10.18653/v1/N19-1423</idno>
	</analytic>
	<monogr>
		<title level="m">Proc. of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL</title>
				<meeting>of the 2019 Conf. of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, ACL</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Nlbse&apos;22 tool competition</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Chaparro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Di Sorbo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Panichella</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of The 1st Int&apos;l Work. on Natural Language-based Software Eng. (NLBSE&apos;22)</title>
				<meeting>of The 1st Int&apos;l Work. on Natural Language-based Software Eng. (NLBSE&apos;22)</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Issue report classification using pre-trained language models</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lanubile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</author>
		<idno type="DOI">10.1145/3528588.3528659</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 1st Int&apos;l Workshop on Natural Language-Based Software Eng. (NLBSE)</title>
				<meeting><address><addrLine>USA</addrLine></address></meeting>
		<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Izadi</surname></persName>
		</author>
		<idno type="DOI">10.1145/3528588.3528662</idno>
		<title level="m">CatIss: An Intelligent Tool for Categorizing Issues Reports using Transformers</title>
				<meeting><address><addrLine>NLBSE</addrLine></address></meeting>
		<imprint>
			<date type="published" when="2022">2022. 2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<title level="m" type="main">Roberta: A robustly optimized bert pretraining approach</title>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ott</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Goyal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Du</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Joshi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Levy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lewis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zettlemoyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Stoyanov</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1907.11692</idno>
		<imprint>
			<date type="published" when="2019">2019</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">Data quality matters: A case study on data label correctness for security bug report prediction</title>
		<author>
			<persName><forename type="first">X</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lo</surname></persName>
		</author>
		<idno type="DOI">10.1109/TSE.2021.3063727</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Software Engineering</title>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<title level="m" type="main">Efficient Few-Shot Learning Without Prompts</title>
		<author>
			<persName><forename type="first">L</forename><surname>Tunstall</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Reimers</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><forename type="middle">E S</forename><surname>Jo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Bates</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Korat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wasserblat</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Pereg</surname></persName>
		</author>
		<idno type="DOI">10.48550/arXiv.2209.11055</idno>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<analytic>
		<title level="a" type="main">Few-shot learning for issue report classification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lanubile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 2nd Int&apos;l Work. on Natural Language-Based Software Eng. (NLBSE)</title>
				<imprint>
			<date type="published" when="2023">2023. 2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<author>
			<persName><forename type="first">X</forename><surname>Hou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Luo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Lo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Grundy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.10620</idno>
		<title level="m">Large language models for software engineering: A systematic literature review</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Fan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Gokkaya</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Lyubarskiy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sengupta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Yoo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Zhang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2310.03533</idno>
		<title level="m">Large language models for software engineering: Survey and open problems</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">ChatGPT: Optimizing Language Models for Dialogue</title>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Leveraging gpt-like llms to automate issue labeling</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lanubile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Quaranta</surname></persName>
		</author>
		<idno type="DOI">10.1145/3643991.3644903</idno>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 21th International Conference on Mining Software Repositories (MSR) (to appear)</title>
				<imprint>
			<date type="published" when="2024">2024. 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b19">
	<analytic>
		<title level="a" type="main">The nlbse&apos;23 tool competition</title>
		<author>
			<persName><forename type="first">R</forename><surname>Kallis</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Izadi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Pascarella</surname></persName>
		</author>
		<author>
			<persName><forename type="first">O</forename><surname>Chaparro</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Rani</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. of The 2nd Intl. Work. on Natural Language-based Software Engineering (NLBSE&apos;23)</title>
				<meeting>of The 2nd Intl. Work. on Natural Language-based Software Engineering (NLBSE&apos;23)</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b20">
	<analytic>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">J</forename><surname>Viera</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Garrett</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Understanding interobserver agreement: the kappa statistic</title>
				<imprint>
			<date type="published" when="2005">2005</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b21">
	<monogr>
		<title level="m" type="main">Few-shot learning for issue report classification</title>
		<author>
			<persName><forename type="first">G</forename><surname>Colavito</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Lanubile</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Novielli</surname></persName>
		</author>
		<idno type="DOI">10.5281/zenodo.7628150</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b22">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">B</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Herbert-Voss</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Krueger</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Henighan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Child</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Ramesh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">M</forename><surname>Ziegler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Winter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Hesse</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Sigler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Litwin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Gray</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Chess</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Clark</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Berner</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Mccandlish</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Radford</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Sutskever</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Amodei</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20</title>
				<meeting>the 34th International Conference on Neural Information Processing Systems, NIPS&apos;20<address><addrLine>Red Hook, NY, USA</addrLine></address></meeting>
		<imprint>
			<publisher>Curran Associates Inc</publisher>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b23">
	<monogr>
		<title level="m" type="main">Llm is like a box of chocolates: the nondeterminism of chatgpt in code generation</title>
		<author>
			<persName><forename type="first">S</forename><surname>Ouyang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">M</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Harman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Wang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2308.02828</idno>
		<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b24">
	<analytic>
		<title level="a" type="main">Chain-of-thought prompting elicits reasoning in large language models</title>
		<author>
			<persName><forename type="first">J</forename><surname>Wei</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Schuurmans</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Bosma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Ichter</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Xia</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><forename type="middle">V</forename><surname>Chi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Le</surname></persName>
		</author>
		<author>
			<persName><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Koyejo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Mohamed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Belgrave</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="24824" to="24837" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b25">
	<analytic>
		<title level="a" type="main">Large language models are zero-shot reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kojima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Iwasawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Advances in Neural Information Processing Systems</title>
				<editor>
			<persName><forename type="first">S</forename><surname>Koyejo</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">S</forename><surname>Mohamed</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Agarwal</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><surname>Belgrave</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">K</forename><surname>Cho</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">A</forename><surname>Oh</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2022">2022</date>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="22199" to="22213" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
