<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Semantic Error Detection in Code Translation Using Knowledge-Driven Static Analysis with AI Chain</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Lei</forename><surname>Chen</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Sai</forename><surname>Zhang</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Fangzhou</forename><surname>Xu</surname></persName>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Liang</forename><surname>Wan</surname></persName>
							<email>lwan@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Xiaowang</forename><surname>Zhang</surname></persName>
							<email>xiaowangzhang@tju.edu.cn</email>
							<affiliation key="aff0">
								<orgName type="department">College of Intelligence and Computing</orgName>
								<orgName type="institution">Tianjin University</orgName>
								<address>
									<postCode>300350</postCode>
									<settlement>Tianjin</settlement>
									<country key="CN">China</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Semantic Error Detection in Code Translation Using Knowledge-Driven Static Analysis with AI Chain</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">291BFFF0D583BBEDE681108DA6109422</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T16:48+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Large Language Models</term>
					<term>Semantic Mistakes</term>
					<term>Knowledge Base</term>
					<term>Code Translation</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>In the task of code translation, neural network-based models frequently generate semantically incorrect code that deviates from the original logic of the source code. This problem persists even with advanced large models. While a recent approach suggests using test cases to identify these semantic errors, its effectiveness is highly dependent on the quality of the test cases, making it unsuitable for code snippets that lack test cases in real-world scenarios. To automatically locate semantic errors in code translation without valid test cases, we propose the Knowledge-guided Semantic Analysis Framework (KSAF). KSAF decomposes the source and translated code synchronously and performs static analysis to detect semantic errors. This is achieved by leveraging fine-grained knowledge in conjunction with an AI chain-driven Large Language Model (LLM). In a previously studied benchmark of Python programs, our framework based on the GPT-3.5-turbo model achieved a correctness rate of 47.8% through a static evaluation method. This result represents a 37.2% improvement over the baseline using the same base model and a 13.4% improvement in correctness compared to the baseline using the GPT-4-turbo-based model.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Code translation involves converting a program written in one programming language into another, ensuring that the original functionality remains intact. Neural network models have achieved significant success in this task, but recent studies <ref type="bibr" target="#b0">[1,</ref><ref type="bibr" target="#b1">2]</ref> have found that these models often introduce subtle errors. These subtle errors can be grouped into syntactic and semantic errors. Syntax errors violate the syntax rules of destination languages, which a grammar checker can often identify. In contrast, semantic errors are more subtle and may result in translated code that either fails to execute without violating the target language's syntax or produces outputs that are inconsistent with the original code <ref type="bibr" target="#b2">[3]</ref>. For example, as shown in the replace function in Figure <ref type="figure" target="#fig_1">1</ref>, s.replace('-', ' ') in Python replaces all occurrences of '-' with ' ', while in JavaScript, it only replaces the first occurrence by default.</p><p>Based on this, Wang et al. <ref type="bibr" target="#b2">[3]</ref> rely on test cases that can expose semantic errors to analyze code and locate these errors dynamically. However, their method is highly dependent on the quality of the test cases, requiring them to reveal semantic errors effectively, and it cannot handle code snippets lacking valid test cases. Additionally, In the code translation domain, relying on test cases to execute code is not only costly but also poses potential security risks <ref type="bibr" target="#b3">[4]</ref>.</p><p>To automatically locate semantic errors in code translation in the absence of valid test cases, we propose a framework KSAF, which decomposes the source code and translated code synchronously and statically analyses the code to locate semantic errors with fine-grained knowledge combined with AI chain-driven LLM. Experiments show that our approach can achieve better results. KSAF is the first method to locate semantic errors in code translation without test cases. It only requires API documentation, does not need model training, and is adaptable to low-resource languages.  Figure <ref type="figure" target="#fig_1">1</ref> illustrates the general framework of our approach. We first build an API knowledge base by crawling the official JavaScript documentation <ref type="bibr" target="#b4">[5]</ref>. Then, we design a framework based on the knowledge-driven AI chain and code decomposition to locate errors in code translation statically.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Approach</head><p>In this work, we collect API documents from the online resource <ref type="bibr" target="#b4">[5]</ref> through the web crawler tool <ref type="bibr" target="#b5">[6]</ref>, where each API document is a crawled web page containing rich information such as the name of the API, syntax, parameters, samples, and function descriptions. We only keep half-structured API statements and functional descriptions, where an API statement describes the fully qualified name(FQN) of an API, which serves as a retrieval index for the knowledge base. The functional description contains the functional logic and behavior of that API.</p><p>In the Neural Source Map Generator module, as shown in Figure <ref type="figure" target="#fig_1">1</ref> A, the source code, translated code, and a fixed prompt are input into the LLM. This module produces a mapping between atomic fragments in the source code and their corresponding parts in the translated code, providing an ordered list of these atomic fragments.</p><p>In the Code AST Decomposition module, as shown in Figure <ref type="figure" target="#fig_1">1</ref> B, the abstract syntax tree(AST) of the source code is traversed to extract "subtrees" from eight types of nodes as sub-code <ref type="bibr" target="#b6">[7]</ref>. Using the mapping list from Module A, the corresponding translated code for each sub-code is obtained. Each sub-code pair is then passed to the next module.</p><p>After obtaining the sub-code and its corresponding translated code, KSAF uses LLM for static analysis to identify semantic inconsistencies between the source and translated code. We designed a knowledge-driven LLM AI Chain workflow, as shown in Figure <ref type="figure" target="#fig_1">1 C</ref>, which includes three steps: Checking, Comparing, and Locating, all using the same LLM. In the Checking step, KSAF inputs the source code, translated code, and a fixed prompt into the LLM to extract the fully qualified names (FQN) of operators and APIs, then passes the results to the Comparing step. In the Comparing step, the FQNs are linked with an offline-built API knowledge base to obtain the corresponding API function descriptions. These descriptions and the results from the Checking step form a prompt fed into the LLM to precisely summarize the differences in operators and APIs between the source and translated code. In the Locating step, the Comparing step results, source code, and translated code are input into the LLM as a prompt to identify suspicious code lines that might cause semantic inconsistencies between the source and translated code.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Experiments</head><p>In this module, our objective is to compare the effectiveness of KSAF with other methods. To ensure fairness in the experiments, we selected methods that, like KSAF, do not require test cases for static code analysis. Specifically, we chose the widely recognized prompt-based methods that aim to fully leverage the potential of foundational models: LLMs with Few-Shot Learning <ref type="bibr" target="#b7">[8]</ref>: A few examples are provided as demonstration examples in the prompt to guide the LLM in achieving better performance on the task. LLMs with Chain of Thought (CoT) <ref type="bibr" target="#b8">[9]</ref>: By appending "Let's think step by step" at the end of the prompt, the LLM is prompted to explain the reasoning or steps before providing the final answer.</p><p>We used the dataset (excluding test cases) and metrics <ref type="bibr" target="#b2">[3]</ref> of Wang et al. to evaluate our method and baseline. Where 𝒮 𝑠𝑒𝑚 , 𝒮 ℎ𝑖𝑑 , and 𝒮 𝑑𝑖𝑓 denote the ratios of successfully identified errors to the total number of semantic errors, hidden errors, and errors leading to results that differ from the source code output, respectively. Semantic errors are when the code is syntactically correct but logically flawed, causing the program to behave in a way that is not expected. Hidden errors are a special kind of semantic error, which usually can't be immediately localized to a specific fix, even when running test cases. Errors leading to results that differ from the source code output are also a type of semantic error, which does not cause a runtime error but causes the output of the translated code in unit tests to be inconsistent with the source code <ref type="bibr" target="#b2">[3]</ref>.</p><p>As shown in Table <ref type="table" target="#tab_0">1</ref>, our method outperforms all baseline approaches. Additionally, the method proposed by Wang et al. is unable to handle code without test cases, resulting in zero values for all metrics. And following the experimental setup of previous work <ref type="bibr" target="#b2">[3]</ref>, we found that our framework KSAF detected an average of 3.0 suspicious lines, which represents 16.5% of the total lines of code. This indicates that users typically need to review only 1 to 3 lines to understand and fix semantic errors.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Conclusion</head><p>This paper propose a method based on code AST decomposition and fine-grained knowledge combined with an AI chain-driven LLM to locate semantic inconsistencies between source and translated code. This method effectively handles code without test cases. We plan to extend our approach to multi-language datasets and conduct comprehensive experiments to further validate KSAF's effectiveness in the future.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head></head><label></label><figDesc>Knowledge Base def f_gold(s: str, k: int) -&gt; str: s = s.replace('-', '').upper() res = [] cnt = (len(s) % k) or k t = 0 for i, c in enumerate(s): res.append(c) t += 1 if t == cnt: t = 0 cnt = k if i != len(s) -1: res.append('-') return ''.join(res) (C) Checking javascript code=" function f_gold(s, k) { s = s.replace('-', '').toUpperCase(); let res = []; let cnt = (s.length % k) || k; let t = 0; for (let i = 0; i &lt; s.length; i++) { res.push(s[i]); t += 1; if (t == cnt) { t = 0; cnt = k; if (i != s.length -1) { res.push('-');</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: A static analysis framework based on the Large Language Model (LLM).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Performance of baseline methods and KSAF on benchmarks.</figDesc><table><row><cell>Method</cell><cell>𝒮 𝑠𝑒𝑚</cell><cell cols="2">𝒮 𝑠𝑢𝑏_𝑠𝑒𝑚 𝒮 ℎ𝑖𝑑 𝒮 𝑑𝑖𝑓</cell></row><row><cell>Few Shot +GPT-3.5-turbo</cell><cell>4.3%</cell><cell>3.1%</cell><cell>2.6%</cell></row><row><cell>CoT + GPT-3.5-turbo</cell><cell cols="3">10.6% 13.0% 14.0%</cell></row><row><cell cols="4">Few Shot + GPT-4-turbo-preview 32.1% 36.2% 34.5%</cell></row><row><cell>CoT + GPT-4-turbo-preview</cell><cell cols="3">34.4% 44.6% 45.1%</cell></row><row><cell>KSAF+GPT-3.5-turbo</cell><cell cols="3">47.8% 46.4% 45.1%</cell></row></table></figure>
		</body>
		<back>

			<div type="acknowledgement">
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Acknowledgments</head><p>This work was supported by the Project of Science and Technology Research and Development Plan of China Railway Corporation (N2023J044).</p></div>
			</div>

			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Lost in translation: A study of bugs introduced by large language models while translating code</title>
		<author>
			<persName><forename type="first">R</forename><surname>Pan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">R</forename><surname>Ibrahimzada</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Krishna</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Sankar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><forename type="middle">P</forename><surname>Wassi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Merler</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Sobolev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Pavuluri</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sinha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Jabbarvand</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">IEEE/ACM 46th International Conference on Software Engineering (ICSE)</title>
				<imprint>
			<publisher>IEEE Computer Society</publisher>
			<date type="published" when="2024">2024. 2024</date>
			<biblScope unit="page" from="866" to="866" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Jigsaw: Large language models meet program synthesis</title>
		<author>
			<persName><forename type="first">N</forename><surname>Jain</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Vaidyanath</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Iyer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Natarajan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Parthasarathy</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Rajamani</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Sharma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 44th International Conference on Software Engineering</title>
				<meeting>the 44th International Conference on Software Engineering</meeting>
		<imprint>
			<date type="published" when="2022">2022</date>
			<biblScope unit="page" from="1219" to="1231" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Transmap: Pinpointing mistakes in neural code translation</title>
		<author>
			<persName><forename type="first">B</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Saxena</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering</title>
				<meeting>the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="999" to="1011" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Codetransocean: A comprehensive multilingual benchmark for code translation</title>
		<author>
			<persName><forename type="first">W</forename><surname>Yan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Tian</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Findings of the Association for Computational Linguistics: EMNLP 2023</title>
				<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="5067" to="5089" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<monogr>
		<ptr target="https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference" />
		<title level="m">Javascript reference</title>
				<imprint>
			<date type="published" when="2024-03-11">2024. March 11, 2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<monogr>
		<ptr target="https://beautiful-soup-4.readthedocs.io/en/latest/" />
		<title level="m">beautiful-soup 4</title>
				<imprint>
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
	<note>Beautiful soup 4</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Fine-grained code clone detection with block-based splitting of abstract syntax tree</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Xu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Fang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Yuan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Jin</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis</title>
				<meeting>the 32nd ACM SIGSOFT International Symposium on Software Testing and Analysis</meeting>
		<imprint>
			<date type="published" when="2023">2023</date>
			<biblScope unit="page" from="89" to="100" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Language models are few-shot learners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Brown</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Mann</surname></persName>
		</author>
		<author>
			<persName><forename type="first">N</forename><surname>Ryder</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Subbiah</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">D</forename><surname>Kaplan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Dhariwal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Neelakantan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Shyam</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Sastry</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Askell</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="1877" to="1901" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Large language models are zero-shot reasoners</title>
		<author>
			<persName><forename type="first">T</forename><surname>Kojima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">S</forename><surname>Gu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Reid</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Matsuo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Iwasawa</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Advances in neural information processing systems</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="page" from="22199" to="22213" />
			<date type="published" when="2022">2022</date>
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
