<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Decompositional Semantic Analysis for LLM-based Code Quality Evaluation ⋆</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Fangzhou</forename><surname>Xu</surname></persName>
							<email>xu_fangzhou@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sai</forename><surname>Zhang</surname></persName>
							<email>zhang_sai@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Xiaowang</forename><surname>Zhang</surname></persName>
							<email>xiaowangzhang@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Yahong</forename><surname>Han</surname></persName>
							<email>yahong@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Decompositional Semantic Analysis for LLM-based Code Quality Evaluation ⋆</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">3B7BFB75CBC19FD58751C2C5863AB93D</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Code evaluation</term>
					<term>Large Language Models</term>
					<term>Code Semantic</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Code quality evaluation involves scoring generated code quality based on a reference code. Extensive research has demonstrated that current evaluations do not truly reflect code quality. We propose Decompositional Semantic Analysis for Code Quality Evaluation. We employ a decompositional approach to enable LLMs to analyze portions of code semantics independently each time, obtaining the code semantics through multiple interactions with LLMs. We designed a Semantic Storage unit to make independent analysis feasible, by retriving related semantic descriptions. Experimental results indicate that our approach surpasses existing state-of-the-art methods in correlation with code execution.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Code quality evaluation involves scoring generated code quality based on a reference code for a specific problem statement. Existing methods <ref type="bibr" target="#b0">[1]</ref>  <ref type="bibr" target="#b1">[2]</ref> rely on superficial code matching as an evaluation metric, which fails to capture code semantics accurately. Moreover, extensive research has demonstrated that existing methods do not truly reflect code quality <ref type="bibr" target="#b2">[3]</ref>.</p><p>With the development of large language models (LLMs) in recent years, studies <ref type="bibr" target="#b3">[4]</ref> have proven the feasibility of using LLMs as evaluators for generative tasks. However, due to issues like hallucinations and uncertainty in LLMs <ref type="bibr" target="#b4">[5]</ref>, their correlation with code execution remains at a lower level <ref type="bibr" target="#b5">[6]</ref>, making the direct use of LLMs for code quality evaluation challenging. To address these issues, we propose Decompositional Semantic Analysis for LLM-based Code Quality Evaluation (DSA-CQE). We employ a decompositional approach to enable LLMs to comprehend portions of code semantics independently each time, obtaining the code semantics through multiple interactions with LLMs. We designed a Semantic Storage unit to make independent analysis feasible, allowing LLMs to achieve more accurate semantics by breaking down complex problems. Finally, the generated code is scored based on a semantic comparison between the reference code and itself. Experimental results indicate that DSA-CQE surpasses existing state-of-the-art methods in terms of correlation with code execution.  DSA-CQE inputs the generated code and the reference code, the output is the score of the generated code. First, the semantic of both codes is obtained through a Decompositional Code Semantic Analysis unit. Subsequently, the code semantic comparison unit determines the differences in semantics. Finally, the generated code's score is derived by analyzing these semantic differences through an LLM. In Decompositional Code Semantic Analysis, we considered eight types of nodes of Abstract Syntax Tree (AST) <ref type="bibr" target="#b6">[7]</ref> as our predefined nodes: "For", "While", "Assign", "If", "ClassDef", "FunctionDef", "Switch", and "Call". We perform a depth-first traversal of the code's AST, extracting the "subtrees" under predefined nodes as sub-codes. This approach can decompose the originally complex code into simpler subcodes, allowing the LLM to perform semantic analysis 1 on each part separately, thereby reducing the hallucination phenomenon <ref type="bibr" target="#b4">[5]</ref>.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Approach</head><p>After decomposing the code into several sub-code, it is not feasible to analyze them individually, as most code segments are interrelated through references and dependencies. Analyzing them in isolation could lead to missing external references, such as variables and function definitions. We designed a Semantic Storage unit that stores textual descriptions of semantics during the analysis process, which may be required for subsequent code semantic analysis. As shown in Fig <ref type="figure" target="#fig_2">2,</ref> a search is conducted within the Semantic Storage unit to retrieve relevant semantic descriptions. These descriptions are concatenated with the original sub-code and, together with a pre-designed prompt template, are input into the LLM to obtain the semantic description of the sub-code. For example, variables such as 'n', 'cap', and 'wei', which appeared previously in other sub-codes, can be easily misunderstood by the LLM without additional semantic information. Without context, the LLM might misinterpret n as any generic integer or cap as an abbreviation unrelated to the problem domain. However, after conducting semantic analysis on the earlier sub-codes, the semantics of these variables have already been stored in the Semantic Storage unit. We only need to retrieve these stored semantics and incorporate them into the prompt template to provide the LLM with the necessary semantic context for these external variables.</p><p>The semantics of the code stored in the Semantic Storage are not static. Each time a semantic description of a sub-code is obtained, the LLM is prompted to update the semantic descriptions of each external variable based on the new description. These updated semantic descriptions are then re-stored in the Semantic Storage unit for further analysis. As shown in Fig <ref type="figure" target="#fig_2">2</ref>, the variable 'dp', initially described as "a dynamic programming array initialized to 0, " is updated to "stores the maximum value for each possible weight" after semantic analysis.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments</head><p>We conducted our experiments (following previous work <ref type="bibr" target="#b3">[4]</ref>) on the HumanEval dataset <ref type="bibr" target="#b7">[8]</ref> exclusively, as most of the code samples in the CoNaLa <ref type="bibr" target="#b8">[9]</ref> subset of the dataset <ref type="bibr" target="#b2">[3]</ref> used for evaluation are single-line codes lacking complex semantics. While the Card2Code Hearthstone <ref type="bibr" target="#b9">[10]</ref> subset contains semantically more complex structures, such as "classes", these "classes" follow a uniform structure with minimal variation. In practice, a significant portion of code demonstrates both complexity and semantic diversity. In contrast, the HumanEval dataset contains a rich and diverse range of code samples, making it the ideal choice for our experiments and evaluation. Cassano et al. <ref type="bibr" target="#b10">[11]</ref> ran test cases on the HumanEval dataset and provided the functional correctness of each piece of code. We use the Pearson <ref type="bibr" target="#b11">[12]</ref> and Kendall <ref type="bibr" target="#b12">[13]</ref> correlation coefficient between the functional correctness scores and the scores given by different methods for comparison. To ensure fairness, we uniformly used GPT-3.5 Turbo <ref type="bibr" target="#b13">[14]</ref> as the backbone model and set the LLM temperature to 0.2. We used state-of-the-art evaluation methods based on n-gram matching and deep learning, namely CodeBleu <ref type="bibr" target="#b0">[1]</ref> and CodeBertScore <ref type="bibr" target="#b1">[2]</ref>, as baselines. The prompt for 1-shot utilized Zhou's prompt template <ref type="bibr" target="#b3">[4]</ref>. Simplified DSA-CQE is our framework, which replaces decomposition analysis with single-step analysis using LLMs 1 .</p><p>The experimental results are shown in the table 1. As can be seen, DSA-CQE performed significantly better on the HumanEval dataset compared to traditional code evaluation methods, with a Pearson correlation coefficient of 0.594. The single-step prompt and Simplified DSA-CQE methods achieved Pearson correlation coefficients of 0.106 and 0.512, respectively. This indicates that DSA-CQE, through decompositional semantic analysis, enhances the LLM's comprehension of code semantics and improves overall performance in code evaluation.</p><p>Our current experiment focuses solely on evaluating the quality of Python code. However, since the method relies on the Abstract Syntax Tree, adapting it to other programming languages</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Framework of DSA-CQE.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Fig 1</head><label>1</label><figDesc>Fig 1 illustrates the overall framework of DSA-CQE.DSA-CQE inputs the generated code and the reference code, the output is the score of the generated code. First, the semantic of both codes is obtained through a Decompositional Code Semantic Analysis unit. Subsequently, the code semantic comparison unit determines the differences in semantics. Finally, the generated code's score is derived by analyzing these semantic differences through an LLM. In Decompositional Code Semantic Analysis, we considered eight types of nodes of Abstract Syntax Tree (AST)<ref type="bibr" target="#b6">[7]</ref> as our predefined nodes: "For", "While", "Assign", "If", "ClassDef", "FunctionDef", "Switch", and "Call". We perform a depth-first traversal of the code's AST, extracting the "subtrees" under predefined nodes as sub-codes. This approach can decompose the originally complex code into simpler subcodes, allowing the LLM to perform semantic analysis 1 on each part separately, thereby reducing the hallucination phenomenon<ref type="bibr" target="#b4">[5]</ref>.After decomposing the code into several sub-code, it is not feasible to analyze them individually, as most code segments are interrelated through references and dependencies. Analyzing them in isolation could lead to missing external references, such as variables and function definitions. We designed a Semantic Storage unit that stores textual descriptions of semantics during the analysis process, which may be required for subsequent code semantic analysis. As shown in Fig2,a search is conducted within the Semantic Storage unit to retrieve relevant semantic descriptions. These descriptions are concatenated with the original sub-code and, together with a pre-designed prompt template, are input into the LLM to obtain the semantic description of the sub-code. For example, variables such as 'n', 'cap', and 'wei', which appeared previously in other sub-codes, can be easily misunderstood by the LLM without additional semantic information. Without context, the LLM might misinterpret n as any generic integer or cap as an abbreviation unrelated to the problem domain. However, after conducting semantic analysis on the earlier sub-codes, the semantics of these variables have already been stored in the Semantic Storage unit. We only need to retrieve these stored semantics and incorporate them into the prompt template to provide the LLM with the necessary semantic context for these external variables.The semantics of the code stored in the Semantic Storage are not static. Each time a semantic description of a sub-code is obtained, the LLM is prompted to update the semantic descriptions of each external variable based on the new description. These updated semantic descriptions are then re-stored in the Semantic Storage unit for further analysis. As shown in Fig2, the variable 'dp', initially described as "a dynamic programming array initialized to 0, " is updated to "stores the maximum value for each possible weight" after semantic analysis.</figDesc><graphic coords="2,89.29,121.24,416.70,59.84" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Demonstrates how the Semantic Storage unit eliminates external dependencies, as well as the process of updating its internal semantic descriptions</figDesc><graphic coords="3,89.29,84.19,416.70,85.90" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Kendall-Tau (𝜏), Pearson (𝑟 𝑠 ) correlations. The best performance is bold.</figDesc><table><row><cell>Method</cell><cell>𝑟 𝑠</cell><cell>𝜏</cell></row><row><cell>CodeBleu</cell><cell cols="2">.295 .241</cell></row><row><cell>CodeBertScore</cell><cell cols="2">.430 .352</cell></row><row><cell>1-shot</cell><cell cols="2">.106 .105</cell></row><row><cell cols="3">Simplified DSA-CQE .512 .470</cell></row><row><cell>DSA-CQE</cell><cell cols="2">.594 .553</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><p>involves merely substituting the relevant parser. For instance, Java code can be parsed using JavaParser <ref type="bibr" target="#b14">[15]</ref>, while pycparser <ref type="bibr" target="#b15">[16]</ref> can be used for C code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>In this poster, we propose Decompositional Semantic Analysis for LLM-based Code Quality Evaluation. We employ a decompositional approach to enable LLMs to analysis portions of code semantics independently each time, obtaining the code semantics through multiple interactions with LLMs. We designed a Semantic Storage unit to make independent analysis feasible, by retriving related semantic descriptions. The generated code is scored based on a semantic comparison between the reference code and itself. The experimental results show that DSA-CQE surpasses all existing methods in correlation with code execution.</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Guo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Lu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Tang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Sundaresan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Blanco</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ma</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2009.10297</idno>
		<title level="m">Codebleu: a method for automatic evaluation of code synthesis</title>
				<imprint>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">S</forename><surname>Zhou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">U</forename><surname>Alon</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Agarwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2302.05527</idno>
		<title level="m">Codebertscore: Evaluating code generation with pretrained models of code</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Out of the bleu: how should we assess quality of the code generation models?</title>
		<author>
			<persName><forename type="first">M</forename><surname>Evtikhiev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Bogomolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Sokolov</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Bryksin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Systems and Software</title>
		<imprint>
			<biblScope unit="volume">203</biblScope>
			<biblScope unit="page">111741</biblScope>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">Y</forename><surname>Zhuo</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2304.14317</idno>
		<title level="m">Large language models are state-of-the-art evaluators of code generation</title>
				<imprint>
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Survey of hallucination in natural language generation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Frieske</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Yu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Su</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Ishii</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">J</forename><surname>Bang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Madotto</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Fung</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Computing Surveys</title>
		<imprint>
			<biblScope unit="volume">55</biblScope>
			<biblScope unit="page" from="1" to="38" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<author>
			<persName><forename type="first">M</forename><surname>Zhong</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Mao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Jiao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Zhu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Ji</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Han</surname></persName>
		</author>
		<idno type="arXiv">arXiv:2210.07197</idno>
		<title level="m">Towards a unified multi-dimensional evaluator for text generation</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Understanding source code evolution using abstract syntax tree matching</title>
		<author>
			<persName><forename type="first">I</forename><surname>Neamtiu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename><surname>Foster</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Hicks</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 2005 international workshop on Mining software repositories</title>
				<meeting>the 2005 international workshop on Mining software repositories</meeting>
		<imprint>
			<date type="published" when="2005">2005</date>
			<biblScope unit="page" from="1" to="5" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T H J E A</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Mark</forename></persName>
		</author>
		<idno type="arXiv">arXiv:2107.03374</idno>
		<title level="m">Evaluating large language models trained on code</title>
				<imprint>
			<date type="published" when="2021">2021</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Learning to mine aligned code and natural language pairs from stack overflow</title>
		<author>
			<persName><forename type="first">P</forename><surname>Yin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Deng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Vasilescu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Neubig</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 15th international conference on mining software repositories</title>
				<meeting>the 15th international conference on mining software repositories</meeting>
		<imprint>
			<date type="published" when="2018">2018</date>
			<biblScope unit="page" from="476" to="486" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<monogr>
		<author>
			<persName><forename type="first">W</forename><surname>Ling</surname></persName>
		</author>
		<author>
			<persName><forename type="first">E</forename><surname>Grefenstette</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Hermann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Kočiskỳ</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Senior</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Blunsom</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.06744</idno>
		<title level="m">Latent predictor networks for code generation</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Multipl-e: A scalable and polyglot approach to benchmarking neural code generation</title>
		<author>
			<persName><forename type="first">F</forename><surname>Cassano</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Gouwar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Nguyen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Phipps-Costin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Pinckney</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M.-H</forename><surname>Yee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Zi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">J</forename><surname>Anderson</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">Q</forename><surname>Feldman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Guha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Greenberg</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Jangda</surname></persName>
		</author>
		<idno type="DOI">10.1109/TSE.2023.3267446</idno>
	</analytic>
	<monogr>
		<title level="j">IEEE Transactions on Software Engineering</title>
		<imprint>
			<biblScope unit="volume">49</biblScope>
			<biblScope unit="page" from="3675" to="3691" />
			<date type="published" when="2023">2023</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<monogr>
		<author>
			<persName><forename type="first">I</forename><surname>Cohen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Benesty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Benesty</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">I</forename><surname>Cohen</surname></persName>
		</author>
		<title level="m">Pearson correlation coefficient, Noise reduction in speech processing</title>
				<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="page" from="1" to="4" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<analytic>
		<title level="a" type="main">A new measure of rank correlation</title>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">G</forename><surname>Kendall</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Biometrika</title>
		<imprint>
			<biblScope unit="volume">30</biblScope>
			<biblScope unit="page" from="81" to="93" />
			<date type="published" when="1938">1938</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<monogr>
		<author>
			<persName><surname>Openai</surname></persName>
		</author>
		<ptr target="https://platform.openai.com/docs/guides/text-generation/chat-completions-api" />
		<title level="m">Openai gpt-3.5 turbo</title>
				<imprint>
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title/>
		<author>
			<persName><surname>Javaparser</surname></persName>
		</author>
		<ptr target="https://github.com/javaparser/javaparser" />
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<monogr>
		<ptr target="https://github.com/eliben/pycparser,n.d" />
		<title level="m">pycparser</title>
				<imprint/>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
