<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>On the Use of LLMs for Upgrading Legacy Smart Contracts: An Initial Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrea Pinna</string-name>
          <email>pinna.andrea@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gavina Baralla</string-name>
          <email>gavina.baralla@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Ibba</string-name>
          <email>giacomof.ibba@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Roberto Tonelli</string-name>
          <email>roberto.tonelli@unica.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, University of Cagliari</institution>
          ,
          <addr-line>Via Ospedale 72, Cagliari</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>Legacy smart contracts, which were developed using outdated versions of programming languages, present significant challenges in terms of maintainability, security, and compatibility. These challenges arise from substantial changes in syntax and best practices that have been adopted in more recent versions to enhance safety and security. Notably, programming languages for smart contracts evolve at a faster pace than traditional programming languages, making timely upgrades even more critical. Upgrading these contracts to the latest version of the language is essential for mitigating known vulnerabilities, leveraging improved security features, and ensuring compatibility with contemporary blockchain environments. Large Language Models have demonstrated considerable utility in the realm of automatic code generation, thereby accelerating the development process for programmers. This paper investigates the application of LLMs for the purpose of upgrading smart contract code. In this preliminary study, we specifically examine the efectiveness of the LLM Claude 3.7 in automating the migration of legacy Solidity smart contracts to version 0.8.20 of the language. Through a series of controlled experiments, we assess Claude 3.7's performance in generating syntactically correct, secure, and functional upgraded versions of a benchmark set comprising 21 selected legacy Solidity source codes, which are representative of common use cases for smart contracts. The experimental design, which includes prompt engineering and dataset selection, aims to obtain both quantitative measurements and qualitative assessments of the modifications made to the code, the generated test suite, and the auto-generated technical reports, as well as the overall efectiveness of the approach. The results of the analysis indicate significant variability in the performance of the LLM across the tasks, particularly in relation to the varying levels of complexity inherent in the legacy code. This trend is further substantiated by multiple analyses, including the number of iterations required to achieve a compilable result free of errors during testing, the ability to manage outdated or deprecated practices in Solidity programming, and the depth of detail provided in the generated technical reports. This study is intended to serve as a precursor to a broader investigation that will compare diferent LLMs in the upgrading of contracts written in various programming languages for smart contracts.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Blockchain</kwd>
        <kwd>Smart contracts</kwd>
        <kwd>Upgrade</kwd>
        <kwd>Large Language Models</kwd>
        <kwd>Solidity</kwd>
        <kwd>Claude Anthropic</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Blockchain smart contracts, which are programs written to be executed within a blockchain, were
originally conceived to implement immutable and verifiable agreements between parties[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. This
immutability makes it particularly important to determine the presence of vulnerabilities in the code
before deploying the contract, using appropriate tools, and in any case, to follow best practices and use
the most up-to-date versions of the language and compiler[
        <xref ref-type="bibr" rid="ref2 ref3 ref4">2, 3, 4</xref>
        ]. Failing to undertake this activity
can result in security risks and asset losses[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. It is well known that smart contracts constitute the
blockchain component of all decentralized applications (dApps), and it is crucial that when updating the
blockchain component, compatibility with the rest of the system is not lost. At the same time, delaying
updates can lead to higher costs in the long term[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In the field of programmable blockchains, it is
not uncommon for a smart contract code to become "legacy" in a short period of time. Indeed, in this
sector, there are frequent updates to the languages and supporting libraries, rendering the code outdated
and incompatible, even within a few months. Therefore, while could be possible to update blockchain
smart contracts (either natively or by using specific patterns like the proxy pattern)[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] maintaining
compatibility and preventing disruption to the overall system is a crucial concern when updating the
core blockchain components of a dApp.
      </p>
      <p>
        This underscores the need for scalable, reliable, and secure methods to upgrade these contracts. Several
methodologies have been proposed to address the problem of smart contract upgrades, ranging from
manual code audits and patching, to automated bytecode rewriting and pattern-based upgrade models [
        <xref ref-type="bibr" rid="ref8 ref9">8,
9</xref>
        ]. Despite their efectiveness, these approaches share common limitations, such as requiring significant
manual eforts, specialized technical knowledge, and the risk of introducing further vulnerabilities or
breaking compatibility during the upgrade process. In this context, automated methodologies based on
advanced artificial intelligence techniques, such as Large Language Models (LLMs), have emerged as
promising alternatives capable of bridging these gaps[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The proven efectiveness of LLMs in Solidity
programming tasks has set a significant precedent for recognizing their potential in smart contract
development [
        <xref ref-type="bibr" rid="ref11 ref12 ref3">11, 12, 3</xref>
        ]. This capability is in line with the vision of making smart contract upgrades
more accessible and scalable [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The types of issues in legacy contracts that LLMs could help address
during the upgrade process include specific vulnerabilities such as function selection conflicts and
storage slot collisions [
        <xref ref-type="bibr" rid="ref13 ref4">13, 4</xref>
        ].
      </p>
      <p>
        In this work, we explores and evaluate the application LLMs in automating the upgrade of legacy
smart contract code. For this initial study, we focus on the upgrade of legacy Solidity code to a modern
language version, namely Solidity 0.8.20 and focus our attention of Claude 3.7, an advanced LLM
developed by Anthropic 1 The selection of Claude 3.7 is underpinned by its demonstrated ability to
prioritize security issues in code generation, an essential feature when dealing with the inherent
vulnerabilities in legacy contracts [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>Our research is structured into three distinct phases. The first phase involves the creation of a dataset
comprising 21 legacy smart contracts, carefully selected to represent common use cases within the
Solidity ecosystem. This dataset serves as the foundation for our experimental investigations. In the
second phase, we engage in prompt engineering to define specific prompts that guide the model’s tasks,
establishing a systematic workflow for conducting experiments with Claude 3.7. This ensures clarity
in the objectives and tasks assigned to the model, facilitating a structured approach to the evaluation
process. The final phase consists of executing the experiments and analyzing the results. We focus on
assessing the model’s performance in generating upgraded code, examining the variation in lines of
code (LoC), and evaluating the quality of the auto-generated reports. This structured approach not only
allows for a thorough evaluation of Claude 3.7’s capabilities but also lays the groundwork for future
research aimed at comparing multiple LLMs and expanding the scope to include a broader range of
programming languages and smart contract scenarios.</p>
      <p>The structure of this paper is as follows: Section 2 examines previous applications of smart contract
upgrades and the use of LLMs in smart contract development, identifying gaps that this study aims
to address. Section 3 describes the methodology, including the experimental design used to evaluate
the performance of Claude 3.7 in upgrading legacy smart contracts. Results and analysis are presented
in Section 4, which provides a thorough assessment of both the process and the model’s performance.
The Discussion section interprets these findingsand the Conclusions section summarizes the study’s
contributions.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>In recent years, significant research eforts have explored the application of large language models within
the domain of smart contract development, particularly focusing on automating contract generation,
validation, vulnerability detection and upgrade processes.</p>
      <p>
        Zhao et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed SCCLLM, a novel and promising approach that leverages both LLMs and
in-context learning for automatic smart contract comment generation. Their method, consisting in
two-phase strategy, considers semantic, syntactic, and lexical information to retrieve relevant examples
from a historical corpus. These examples are then used as demonstrations for in-context learning with
ChatGPT, enabling the model to generate high-quality comments without parameter updating.
      </p>
      <p>
        Barbàra et al. [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] investigated the feasibility of using LLMs to generate production-ready Solidity
smart contracts for non-technical users. Using a lease agreement as a case study, they employed GPT-4
with diferent prompt designs following the CO-STAR methodology. While 94.1% of generated contracts
compiled successfully and most showed only low-impact vulnerabilities in automated tests, expert
analysis revealed critical logical flaws that automated tools couldn’t detect. The researchers found that
current LLMs are incapable of generating production-ready smart contracts, highlighting the significant
gap between syntactically correct code and functionally sound implementations.
      </p>
      <p>
        Chatterjee and Ramamurthy [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] conducted a comprehensive evaluation of various Large Language
Models (LLMs) for generating Solidity smart contracts on the Ethereum blockchain. Their methodology
involved testing the generated contracts for accuracy, eficiency, and code quality using both descriptive
and structured prompting techniques across three contract scenarios of increasing complexity. Their
ifndings revealed that all models struggled with more complex implementations and most LLMs
overlooked critical security considerations unless explicitly prompted. The authors concluded that
current LLMs show promise for adapting existing smart contracts or assisting developers but are not
yet suitable for industrial smart contract generation due to security and eficiency concerns.
      </p>
      <p>
        Napoli et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] evaluated four leading LLMs (GPT-4-Turbo, Claude-3.5-Sonnet, Mistral-Large, and
Gemini-1.5-Pro) for automated smart contract generation from legal agreements. Their framework
assessed functional completeness and security across five agreement types using eleven design patterns.
Results showed Claude and GPT-4-Turbo significantly outperforming other models, though all generated
contracts contained security vulnerabilities. While LLMs demonstrated promising capabilities in code
generation, they concluded that current models require substantial human oversight for
productionready smart contracts, particularly for complex agreements.
      </p>
      <p>
        Boi et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] propose using fine-tuned Large Language Models for smart contract vulnerability
detection. They fine-tuned Llama-2-7b-chat-hf on a dataset of smart contract vulnerabilities, creating
a unified mapping between OWASP and SWC classifications. Their model achieved 59.5% accuracy
across vulnerability types, performing especially well on arithmetic vulnerabilities (93.3%). Though not
outperforming specialized tools like Mythril, their approach ofers greater accessibility to non-security
experts and provides contextual remediation advice, representing a promising direction for integrating
LLMs into blockchain security workflows.
      </p>
      <p>
        Baralla et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] assessed GitHub Copilot for Solidity development, examining its capabilities in
code generation, implementation assistance, vulnerability detection, and unit testing. Results showed
Copilot excels with simple contracts but struggles with complex logic and security patterns. While
beneficial for standard implementations, Copilot requires human oversight for security-critical smart
contract development.
      </p>
      <p>
        Karanjai et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] presented SolMover, a dual-LLM framework for translating Solidity smart contracts
to Move language. Their approach combines concept mining through retrieval-augmented generation
with subtask-based code production, enhanced by iterative compiler feedback. SolMover significantly
outperformed single-model approaches, successfully translating 54.6% of contracts versus 31.2% for
GPT-3.5, demonstrating that LLMs can efectively generate code in low-resource languages with minimal
ifne-tuning.
      </p>
      <p>
        Ibba et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] developed a methodology using Claude 3.5 and GPT-4 to generate synthetic Ethereum
smart contracts with Denial of Service vulnerabilities. Their research showed Claude outperformed
GPT-4, requiring fewer prompts and producing higher quality outputs. The study addressed the lack
of training data for machine learning security tools by creating realistic vulnerable contracts through
structured prompt engineering. This approach enables better development of classification and anomaly
detection models for blockchain security.
      </p>
      <p>However, despite these advances, research on the use of LLMs to upgrade legacy smart contracts
remains limited. Most studies focus on generating new contracts or analyzing vulnerabilities, leaving
a gap in the literature regarding the application of LLMs to facilitate the upgrade and migration of
existing contracts. This highlights the need for further research to explore how LLMs can be efectively
used to modernize and improve legacy smart contracts while ensuring data security and integrity.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Methodology</title>
      <p>This study employs a methodology based on controlled experiments to evaluate the efectiveness of
using LLMs for the automatic upgrading and testing of smart contracts written in legacy programming
languages. Additionally, it aims to assess the overall eficacy of this approach. As a preliminary
investigation, this work focuses on a single recently released LLM (Claude 3.7 Sonnet), and the evaluation
of both the results and the methodology will provide a foundation for future, more extensive studies.</p>
      <p>The research methodology is organized into three distinct phases. The first phase involves creating a
dataset of legacy smart contracts. The second phase focuses on defining prompts and the experiments
workflow. The third phase consists of executing the experiments. The last phase entails analyzing the
results and evaluating the efectiveness of the approach.</p>
      <p>In the following sections, we first describe the creation and composition of our experimental dataset,
which consists of legacy smart contract source codes. Next, we outline the characteristics of the selected
LLM system, including its limitations and our strategies for efective utilization. Subsequently, we
provide a detailed account of the iterative process employed in designing, optimizing, and refining the
prompts to accurately guide the model in generating both upgraded source codes and the corresponding
test suites. Finally, we present the workflow of the experiments, detailing each step of our systematic
evaluation to minimize potential biases.</p>
      <sec id="sec-3-1">
        <title>3.1. Experimental dataset</title>
        <p>To evaluate the efectiveness of LLMs in upgrading legacy source code, it is essential to have a
heterogeneous and representative set of contracts that encompasses both the types of Use Cases and the
legacy versions of the programming language. However, given the preliminary nature of this work, the
objective is to maintain a limited dataset. designed to provide an initial representative sample while
minimizing complexity and ensuring the preliminary study remains manageable in scope.</p>
        <p>In constructing the dataset, we focus exclusively on smart contracts written in Solidity and considered
two sources: Etherscan verified contracts and the oficial Solidity documentation. The selection criteria
for the source codes are based on the need to include various types of use cases and, for each use case,
diferent legacy versions of the programming language. As a result of the selection, 21 source codes
were selected to compose the experimental dataset. Six of these—Ballot, SimpleAuction, BlindAuction,
Purchase, ReceiverPays, and SimplePaymentChannel—were sourced from the oficial Solidity
documentation 2. The remaining contracts were collected from Etherscan and their corresponding Ethereum
addresses are listed in Table 1.</p>
        <p>As reported in Table 2, the set of Use Cases represented in this dataset includes ERC20 tokens, custom
token implementations (with additional logic) , auctions, voting mechanisms, crypto transfers, time
locks, and vesting schedules. Many contracts in the dataset incorporate the SafeMath library to handle
arithmetic operations safely, preventing problems such as overflows and underflows, which were critical
vulnerabilities in previous versions of Solidity before automatic checks were implemented in version
0.8.0. Pragma versions, namely the version of the solidity compiler specified inside the source file, range
from 0.4.16 to 0.6.12, reflecting a cross-section of syntax rules, functions, and keywords that have been
deprecated in the current version of the language. Source codes vary in size and structural complexity,
with lines of code ranging from fewer than 50 to over 500, and function counts from 3 to more than 50.</p>
        <p>In order to evaluate a potential relationship between program complexity and the quality of the work
performed by the LLM, we categorize the source codes into three levels of complexity: low, medium,
and high.</p>
        <sec id="sec-3-1-1">
          <title>2https://docs.soliditylang.org/en/v0.5.5/solidity-by-example.html</title>
          <p>Low-complexity files with fewer than 100 lines of code and fewer than 10 functions, often focusing
on a single functionality with minimal internal contracts. Medium-complexity files with 100 to 300
lines of code and containing 10 to 30 functions, may also involve moderate multi-contract structures
or custom logic. High-complexity files exceed 300 lines of code or 30 functions, often characterized by
large codebases, advanced functionality, and multiple internal contracts. As reported in Table 1, of the
21 selected contracts, 7 were classified as low complexity, 8 as medium, and 6 as high complexity.</p>
          <p>Despite the relatively small number of contracts included, the diversity of these contracts provides
a preliminary foundation for assessing the LLM’s ability to manage upgrades, maintain interface
consistency, and address deprecated features.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. LLM Interaction</title>
        <p>We have chosen to use a single LLM to consolidate our approach and facilitate future studies comparing
the performance of diferent LLMs. The LLM selected for this study is Claude 3.7 Sonnet, released in
February 2025 3.</p>
        <p>
          Comparative studies have positioned Claude, alongside GPT-4-Turbo, as superior in handling complex
coding tasks, supporting our choice for this initial research. The performance hierarchy noted in these
studies, where Claude excels at both syntactic correctness and managing programming complexity,
ifrmly aligns with the need for efective and secure smart contract upgrades [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>This model operates within specific computational constraints that afect user interaction patterns.
The model uses a resource allocation system where conversation length directly impacts available
interactions—approximately 45 messages every 5 hours for shorter conversations (about 200 English
sentences of 15-20 words each), decreasing to approximately 15 messages when processing larger
documents. This limitation arises from the model’s architecture, which processes the entire conversation
history, including attachments, with each new query.</p>
        <p>Users experience two notable interaction artifacts: first, the need to prompt the model with "continue"
commands when responses exceed output limits, resulting in fragmented information distribution; and
second, system notifications warning that "Long conversations consume usage limits more quickly."</p>
        <p>When the output limit is reached, we observed that in code generation, using the "continue" command
often results in trivial errors, including syntactic mistakes. This occurs because the LLM modifies the
code it generated in the previous iteration. We frequently found instances where lines of code were
not deleted correctly or where code was missing. Empirically, we discovered that if the "continue"
command is issued promptly (within a few seconds after the message appears), the quality of the edited
code improves significantly, particularly in terms of syntactic errors and missing code, especially for
longer code segments (over 300 lines)
4.</p>
        <p>These constraints reflect the resource-intensive nature of maintaining and processing large contextual
information. Usage limits for Claude Pro (approximately five times higher than the free service, which
is the version used for our experiments) are reset every five hours. For optimal use, specific precautions
are necessary, such as starting new conversations for separate topics, grouping multiple questions into
single messages, and avoiding redundant file uploads to maximize available computing resources.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. LLM Prompt Structuring</title>
        <p>A key element of this study is the development of an efective prompt to assess the performance of the
LLM. The purpose of the prompt is to provide the LLM with the necessary description of the objectives
and information to carry out the desired process. For the objectives of this study, the LLM is tasked
with producing upgraded Solidity code from the legacy code and generating tests to verify its correct
functionality. Additionally, at the end of this process, a report is requested to explain the actions taken.</p>
        <p>The methodology used for defining the prompt involves refinement through successive iterations.
Given the complexity of the process of upgrading legacy Solidity smart contracts, considerable efort
was spent on iteratively refining the initial prompt before arriving to the final prompt. At each iteration,
contextual information about the local execution environment was added to the prompt. This included
details such as IDE specifications, the version of the Hardhat testing framework, Solidity compiler
versions, and local testnet configurations. This contextual embedding aimed to maximize the clarity
and relevance of the outputs generated by the LLM.</p>
        <p>The adopted prompt engineering approach consists of dividing the process into two distinct steps:
• Step 1: Contract Upgrade and Testing</p>
        <p>Upgrade: To upgrade a deployed legacy Solidity contract to a newer Solidity version (specifically,</p>
        <sec id="sec-3-3-1">
          <title>3https://www.anthropic.com/news/claude-3-7-sonnet</title>
          <p>4We emphasize that this is an empirical result that should be further investigated; however, we observed a potential
phenomenon of temporary caching or state retention that appears to degrade over time between continuation prompts
version 0.8.20), Claude was prompted with the key requirement to ensure that external interfaces
and function declarations remained the same between legacy and upgraded contracts.
Test Suite Generation: In the same prompt, the LLM was instructed to generate a JavaScript test
suite that would work with both the legacy and upgraded versions of the contract. The test suite
must utilize the Hardhat framework and be structured to test semantic consistency across versions
without modification. Specific design requests were incorporated into the prompt, including:
– Using an environment variable to select the smart contract to test.
– Using fully qualified contract naming for precise targeting.
– Using an automatic Solidity version detection mechanism.
– Include a conditional logic in test cases to accommodate internal behavioral diferences
between various Solidity versions
• Step 2: Report Generation</p>
          <p>Upon successful compilation and passing of all unit tests, a second prompt requested the LLM to
produce a detailed technical report summarizing the upgrade results. The second prompt specify
that the report must included:
– A concise changelog identifying syntax modifications and new language features introduced
by upgrading to Solidity 0.8.20.
– A quantitative summary comparing the number of implemented functionalities (before and
after upgrading).
– A verification that functionalities of the legacy and upgraded contracts retained identical
interfaces and external behavior.
– The comprehensive assessment of test coverage, detailing the number of functionalities
tested in each version of the contract.
– An analysis of the necessary test adaptations made to address the difering behaviors in
the Solidity versions of the legacy and upgraded contracts, including distinctions in error
handling.
– A summary of the most significant modifications that impact contract security, eficiency,
and maintainability.</p>
          <p>The decision to separate the prompt into two steps is based on the following considerations. Iterative
nature of the process: Upgrading smart contracts and developing corresponding test suites is inherently
an iterative process. Critical issues or requirements for changes may become apparent only after the
test protocols have been executed, necessitating post-implementation changes to the upgraded contract.
Greater accuracy of documentation: A report generated following the successful completion of the
contract upgrade and testing phases will inherently demonstrate greater accuracy, efectively reflecting
all changes implemented during the iterative development cycle. Prioritization of primary objectives:
The initial prompt should maintain focus on the primary activities, namely the contract upgrade and
test suite development. Including excessive requirements risks reducing the efectiveness with which
the primary objectives are addressed. Empirical basis for analysis: A subsequently generated report may
be based on empirical test results rather than theoretical projections, thus providing a more substantial
basis for evaluating the efectiveness of the upgrading process. Limitation of Claude chat length: Claude
models impose constraints on the length of individual outputs, often requiring the use of continuation
prompts for extensive responses.</p>
          <p>The resulting prompts are reported as follows.</p>
          <p>Prompt for upgrading and testing
Upgrade this smart contract written in Solidity to the latest pragma version (0.8.20), ensuring that the function interfaces remain identical. The project is structured with two
separate contract folders (’legacy’ and ’upgraded’) where contracts maintain the same name in both folders but use diferent compiler versions. Develop a single set of unit tests
in a .js file that can be used to verify the semantic correctness of both contract versions without any modifications. The tests must be compatible with both versions since the
function declarations are the same, but should account for any changes in internal behavior or error mechanics between Solidity versions. The project is configured with the
following package.json dependencies:
"devDependencies": {
"@nomicfoundation/hardhat-toolbox": "^5.0.0",
"hardhat": "^2.22.19"
}
Structure the tests using the following pattern:
The test file should follow this general structure:
describe("Contract Tests", function () {
});
• Implement an environment variable configuration (CONTRACT_VERSION) to control which contract version to test
• Use fully qualified contract names to reference the specific contract files
• Create a version detection mechanism that automatically adapts tests to the appropriate Solidity version
• Use the current Ethers v6 deployment pattern with waitForDeployment() instead of deployed()
• Include conditional test logic that can handle diferent error mechanisms between Solidity versions.
// The full specification is publicly available on GitHub
// at https://github.com/LLM-and-blockchain/Solidity-Claude/blob/main/prompts.md
Each test should verify that both versions of the contract maintain identical behavior from a user perspective, while adapting to internal implementation diferences between
Solidity versions.</p>
          <p>Prompt for report generation
Based on the implementation of both the legacy and upgraded smart contracts and their corresponding test results, generate a comprehensive technical report that includes:
1. A detailed changelog documenting all modifications between the legacy and upgraded contract versions, including:
• Syntax and language feature updates necessitated by the Solidity version change
• Security improvements and vulnerability mitigation
• Gas optimization techniques applied
• The technical rationale justifying each significant change
2. A quantitative analysis of functionality, including:
• Total count of functions implemented in the legacy contract
• Total count of functions implemented in the upgraded contract
• Any new capabilities introduced in the upgraded version
• Verification that all legacy functionalities remain accessible through identical interfaces
3. A test coverage assessment:
• Count of functionalities successfully tested in the legacy version
• Count of functionalities successfully tested in the upgraded version
• Description of any version-specific test adjustments required: if referred to the contract code implementation or if referred to test code syntax
• Analysis of edge cases and how they were handled diferently between versions
4. A concise executive summary highlighting the most significant changes and their impact on contract security, eficiency, and maintainability.</p>
          <p>The full prompts can be accessed in the GitHub repository of this research5.</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Experimental workflow</title>
        <p>The experimental workflow begins by examining each legacy smart contract file one by one and ends
with the production of a dataset of upgraded smart contracts and the reports generated for each upgrade.
A representation of the experimental workflow is illustrated in Fig. 1.</p>
        <p>To ensure the process of upgrading each contract remained independent and to avoid bias, each
experiment involves a new and separate chat of Claude 3.7 LLM.</p>
        <p>Each conversation with the LLM begins with the first step of the prompting process described above,
which includes uploading the legacy contract source and a written request to upgrade the legacy code
to Solidity 0.8.20, ensuring that interfaces are preserved and generating the corresponding test suites.</p>
        <p>Once Claude 3.7 produces the upgraded contract and the related test suite, these outputs are compiled
and executed locally in the environment described below.</p>
        <p>Whenever errors occurred, either during compilation or test execution, the error report displayed in
the terminal is directly copy and pasted into the running chat to minimize any additional context that</p>
        <sec id="sec-3-4-1">
          <title>5https://github.com/LLM-and-blockchain/Solidity-Claude</title>
          <p>Dataset of
Legacy SCs
UpSgrCasded Final reports</p>
          <p>SC selection</p>
          <p>Legacy SC</p>
          <p>Updated SC</p>
          <p>LLM
Upgrade and testing</p>
          <p>Promt
LLM</p>
          <p>Yes</p>
          <p>Test</p>
          <p>Succesful?
Report generation</p>
          <p>Promt</p>
          <p>No</p>
          <p>Testing
Errors</p>
          <p>Solidity</p>
          <p>Compiler
Test execution</p>
          <p>LLM
Compiling
Succesful?</p>
          <p>Yes
Testing script
could bias the LLM. This iterative process led to resolving syntactic and semantic errors, ensuring the
correct code for the upgraded contract and robust test suites.</p>
          <p>When the upgraded contract compiles successfully and the test suite passes all checks, Claude 3.7 is
provided with the second-step prompt detailed above. This prompt requests the generation of a detailed
technical report about the produced code and the iterations required to achieve a fully functional
upgrade. This final step ensures that each upgrade is thoroughly documented, providing valuable
insights for interpreting the results.
3.4.1. Experimental setup
The primary frameworks and tools utilized for experiments include the LLM Claude 3.7, accessed
remotely via the web interface, as well as a local environment consisting of the following components:
Visual Studio Code IDE (version 1.98.2), an EVM testnet based on Ganache (version 7.9.2), and the
Hardhat framework (version 2.22.19) for compiling and testing smart contracts. Additionally, the
environment includes Node.js (version 20.12.2) and Web3.js (version 1.10.0) to ensure compatibility and
facilitate eficient interaction with Ganache. Multiple versions of the Solidity compiler were maintained,
providing the flexibility to switch between versions as needed for compiling both legacy and upgraded
smart contracts. The local environment runs on an Asus ExpertBook powered by a 12th Gen Intel(R)
Core(TM) i7-1265U processor running at 1.80 GHz, complemented by 40 GB of installed RAM (39.7 GB
usable). The system operates on Windows 11 Pro in a 64-bit x64 architecture.</p>
        </sec>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Evaluation of results</title>
        <p>The results of the experiments are evaluated in both quantitative and qualitative terms.</p>
        <p>In terms of quantitative aspects, the focus is on the number of iterations required to obtain a compilable
code and to pass the tests. This data provides a clear indication of the eficiency of the upgrade process.</p>
        <p>For the qualitative assessment, a manual inspection is conducted to compare the produced upgraded
Solidity code with the legacy code, also in relation to what described in the generated report. This
inspection enables the evaluation of the entire set of experiments and reveals whether the LLM
consistently adopts the same strategy to solve a specific problem or if its approach varies depending on the
changes in the examined source code</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Analysis</title>
      <p>In this section, we analyze three types of results: the performance of the process, the metrics of the
upgraded contracts, and finally, the analysis of the auto-generated reports. The experimental results
and the dataset are available on the Github repository of this research6.</p>
      <sec id="sec-4-1">
        <title>4.1. Process analysis</title>
        <p>The initial evaluation of Claude 3.7’s performance in generating upgraded Solidity code and its
corresponding tests is quantitative. It is based on the number of iterations required to produce compilable
code using version 0.8.20 of the Solidity compiler and to ensure the successful execution of the tests.</p>
        <p>The Table 3 summarizes these data and also includes instances where additional prompts were
necessary (beyond simply copying and pasting errors encountered during compilation or test execution)
to guide the language model toward the desired outcome. Additionally, it notes the number of times
the prompt "continue" was required to address the output length limitation of Claude 3.7, as discussed
earlier.</p>
        <p>We can observe that in just over half of the experiments (11 out of 21), the number of iterations
required for compilation is zero, meaning the LLM produced directly compilable code with the specified
version of the compiler. In 5 cases, only one additional iteration was needed. In four cases—specifically
BNItoken Shop, Eloncat, and PonderAirDropToken—two additional iterations were necessary. In one</p>
        <sec id="sec-4-1-1">
          <title>6https://github.com/LLM-and-blockchain/Solidity-Claude</title>
          <p>case, Omosubi, four iterations were required before the LLM produced compilable code.</p>
          <p>Regarding the generation of the test suite, it is noteworthy that only in three cases did the tests work
successfully on the first output. In most cases, one or two iterations were needed. In one instance, four
iterations were required for PonderAirDropToken, and seven for Shop. These two experiments also
required more additional prompts, as well as error messages, as detailed in the table.</p>
          <p>Finally, regarding the number of times it was necessary to send "continue" to the LLM, it is noteworthy
that this operation was not required in over half of the cases (13 out of 21).</p>
          <p>Analyzing the statistics presented in the Table 4, which are categorized according to the complexity
criteria defined in the previous section, we can observe that the statistics related to the number of
iterations needed to achieve correct compilation tend to increase with the complexity of the legacy code.
This trend is particularly pronounced in the high complexity category, where the average number of
iterations is approximately six times that of low complexity and four times that of medium complexity.</p>
          <p>Similarly, for the iterations required to obtain successfully passing tests without errors, the high
complexity category also shows an average that is about twice that of both low and medium complexity.
When considering the total number of iterations, it is evident that low complexity has a slightly higher
average (about 17%) compared to medium complexity, while high complexity sources require, on average,
more than two and a half times the number of iterations needed for the other two categories.</p>
          <p>Finally, it is interesting to note that the average number of "continue" commands needed to obtain the
desired output is zero for all contracts in the low complexity category, approximately 0.7 for contracts
in the medium complexity category, and 2.7 for contracts in the high complexity category.</p>
          <p>It is worth noting that the standard deviations are all high relative to the mean, primarily due to the
small number of experiments and the heterogeneity of the contracts examined.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Analysis of generated upgraded contracts</title>
        <p>The second analysis conducted involves a quantitative evaluation of the variation of LoC and a manual
inspection of the upgraded code produced.</p>
        <p>Overall, on average, the LLM extends upgraded contracts by 2.2 percent of LoC with respect to legacy
ones. However, when contracts are grouped by complexity, it is observed that low-complexity contracts
are extended on average by 11.52%, while medium complex contracts are extended by 7.86%. In contrast,
highly complex contracts are, on average, shortened by 12.62% in terms of the number of lines of code.
That variations are represented in Fig. 2.</p>
        <p>The automatic upgrades of legacy Solidity smart contracts, performed by the Claude 3.7 LLM, aimed
to maintain the same interface as the legacy contracts while implementing various modifications to
meet the syntax rules of Solidity 0.8.20, as well as to enhance safety and security. A summary of the
manual inspection is reported in Table 4.2.</p>
        <p>Overall, the upgrade process corrects the syntax of keywords used to define constructs (such as
abstract and interface) and functions (like override), as well as built-in modifiers (including
external, internal, and view). The upgrade consistently replaces the deprecated .now with
block.timestamp and always adds messages to require statements when they are missing. In
two instances, it retains assert statements without accompanying comments. Additionally, it almost
always removes the visibility of the constructor, leaving it intact in only one case.</p>
        <p>A significant concern is the inconsistent strategies employed by the LLM in managing the SafeMath
library, which has traditionally been used in Solidity programming to prevent arithmetic overflow and
underflow. In many instances, the language model retained SafeMath even though recent versions
of Solidity now include built-in overflow checks. While it is understandable to want to maintain
legacy interfaces, the decision to keep SafeMath when it is no longer necessary raises questions about
the efectiveness of the upgrade. Retaining outdated dependencies can complicate the code and may
introduce vulnerabilities, especially if developers are not fully aware of the implications of using
such libraries in a modern context. On the other hand, contracts that have successfully removed
SafeMath in favor of Solidity’s built-in safety features demonstrate a positive adaptation to evolving best
practices. However, the inconsistency in this approach across diferent contracts reveals a fragmented
understanding of these advancements.</p>
        <p>The LLM used a varied approach also to upgrade the occurrences of the .transfer method of a
payable address. While it generally upgrades the contract from the first iteration, in adaptation to
best practices, and replacing this invocation with the low-level method .call{value: ... } and
the associated require for the success of this transaction, there are two instances where it retains
.transfer, despite the fact that its use is currently not recommended for security reasons.</p>
        <p>In some cases, legacy contract present a function to destroy the contract. To require a replacement of
the deprecated selfdestruct, it was necessary to interact with the address the compilation warnings
in the upgraded contract. To maintain the contract’s destruction functionality while avoiding the use of
selfdestruct, the language model introduces a locking mechanism that disables all features of the
contract and a contextual transferring of the contract’s funds to the contract owner.</p>
        <p>Notably, the LLM typically adds an MIT license when none is specified, although there is one instance
where it does not.
Results of the manual inspection of upgraded contract in comparison with the legacy code.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Analysis of Auto-generated Reports</title>
        <p>The third analysis examines the generated reports. The generated reports generally comply with the
explicit requests sent to the LLM via the prompt in step 2. The LLM created these reports using the
markdown format and are available, unchanged, in the repository for this study.</p>
        <p>Each report includes a title and begins with a summary of the operations performed by the LLM,
accurately stating both the version of the legacy smart contract language and the target version, which
is 0.8.20.</p>
        <p>All reports are divided into sections. In every case, the first two sections are dedicated to the changelog
and the comparison of feature counts present in both versions of the contract. The report then focuses
on the test suite, evaluating it in terms of feature coverage and describing any necessary adjustments
for the proper execution of the tests (sometimes in two sections, sometimes in one).</p>
        <p>In most cases, the report outlines the benefits in terms of security, eficiency, and maintainability of
the code achieved following the upgrade. It generally concludes with a brief summary or reflection on
the work performed.</p>
        <p>Although the structure is similar, the reports difer significantly in terms of detail and content
representation. Specifically, the reports are uniform only in the initial summary and the subsequent
tabular changelog. Beyond that, there is no consistency in representation, as the same topic may be
presented in a narrative form</p>
        <p>with bullet points or in a tabular format. The Table 6 provides a quick
overview of the number of subsections that the reports use to describe each section, highlighting the
significant diferences between these reports. By analyzing the average total number of subsections for
each report, grouped by the complexity of the legacy code, it is evident that reports for low-complexity
code tend to be less extensive (averaging around 7.6 sections) compared to the other two cases, which
are equivalent at approximately 10 sections each. These results are summarized in Table 7.</p>
        <p>In the cases of the Shop, EtherTool, ReceivePays, SimplePaymentChannel contracts, and the
FreePalestine token, the changelog section includes entire portions of modified code, allowing for direct
comparison through "before" and "after" comments in the first three cases, or using git-style highlighting (with
lines marked in green or red) in the latter. Notably, the SimplePaymentChannel report describes how
the contract was modified to eliminate the use of selfdestruct. In other cases, the report highlights the
names of functions or keywords in the code that were modified.</p>
        <p>The impact assessment is generally presented as a structured list divided into three parts, one for
each topic. In the case of SepuToken, the impact assessment includes a qualitative score for the three
categories considered: Security score, Eficiency score, and Maintainability score (indicating Security as
enhanced, Eficiency as neutral, and Maintainability as significantly enhanced).</p>
        <p>Regarding the management of SafeMath, in cases where the upgrade removes or inhibits this library,
the report highlights the benefits of using built-in overflow protection in terms of contract conciseness,
gas usage, and security. Where SafeMath is retained, the report emphasizes how the new contract
maintains the same behavior and structure as the legacy contract.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>This work represents a preliminary and exploratory study that examines a dataset of 21 legacy sources
and a single LLM, specifically Claude 3.7. The primary objectives are to evaluate the LLM’s approach,
enable manual examination, and obtain both quantitative and qualitative assessments, as well as insights
for a more comprehensive study that would involve a systematic and automated comparison between
multiple LLMs and a wider range of smart contracts.</p>
      <p>The ability of the LLM to produce functioning code with zero or few iterations is encouraging, as
it demonstrates the model’s understanding of the characteristics and keywords of the Solidity 0.8.x
language, allowing it to perform upgrades by modifying the relevant parts of the legacy code while
preserving the logic and interface of the functions.</p>
      <p>The need to iterate multiple times for the tests suggests that the LLM may not have a complete
mastery of creating comprehensive test suites. There were a few cases where additional prompt details
had to be provided to guide the LLM towards returning consistent responses. In other instances, minor
syntax issues were manually corrected to avoid unnecessarily overloading the chat interactions.</p>
      <p>The findings also suggest that the length and complexity of the legacy code seem to influence the
number of iterations required and the quality of the generated reports.</p>
      <p>Additionally, the fact that the LLM’s proposed solutions were not coherent across all upgraded
smart contracts indicates the need to further refine the prompts to establish clear rules for handling
certain situations. For example, it may be beneficial to encourage code changes that align with best
practices rather than maintaining the same interface, as this has led to anomalous behavior when
updating SafeMath. Similarly, this approach should be applied to the handling of transfer and
selfdestruct functions.</p>
      <p>The stylistic diferences and varying levels of detail in the generated reports suggest the need to
better structure the prompts, potentially by providing guidance on the required format. Lastly, the
study has identified some interesting insights, such as the inclusion of a changelog with the code and
the potential for an enhancement score.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusions</title>
      <p>This initial study explored the potential of Claude 3.7 to automate the upgrade of legacy Solidity smart
contracts to version 0.8.20. The adopted methodology allowed for a satisfactory evaluation of both the
LLM’s performance and the approach itself. Our findings highlight both the capabilities and limitations
of using LLMs for this specialized task. Claude 3.7 demonstrated the ability to generate upgraded
and compilable smart contracts with minimal iterations, particularly for low and medium complexity
contracts. However, performance significantly degraded with increased contract complexity, indicating
that high-complexity contracts required more iterations to achieve satisfactory results.</p>
      <p>While the model successfully adapted core syntax, it exhibited inconsistencies in handling critical
security aspects, such as the variable treatment of the SafeMath library and the updating of deprecated
functions. These inconsistencies suggest that, although the LLM can automate much of the upgrade
process, it lacks a consistent understanding of security best practices.</p>
      <p>The generation of test suites proved to be more challenging for the model, requiring more iterations
than the contract upgrades themselves, which indicates that creating efective test suites remains
a complex task. The reports generated by the model showed a solid understanding of the upgrade
dimensions, but their structure and detail varied significantly, with simpler contracts receiving slightly
less extensive documentation.</p>
      <p>This research demonstrates that Claude 3.7 has promising capabilities to reduce the manual efort
required in smart contract upgrades. However, its limitations indicate that it should be viewed as an
assistive tool rather than a complete replacement for human expertise.</p>
      <p>We interpret our results in the context of smart contract security and upgrade automation,
acknowledging potential biases and limitations of the study design. It is clear that human intervention remains
necessary in scenarios involving complex contracts and critical security considerations. Future research
should focus on expanding the dataset to include a wider variety of smart contracts, exploring the LLM’s
capabilities in other blockchain programming languages, and integrating specialized security analysis
tools to assess vulnerabilities in upgraded contracts. Balancing automation with human oversight is
crucial, particularly for security-critical blockchain applications, allowing developers to concentrate on
complex logic changes and security verifications, ultimately leading to more eficient and secure smart
contract upgrades.
This work was partially supported by project SERICS (PE00000014) under the MUR National Recovery
and Resilience Plan funded by the European Union-NextGenerationEU. Additionally, this research is
part of the "IMASS CHAIN - Infrastructure Management Support System Chain," co-funded under the
National Military Research Plan 2020, with CIG: 884399685F and CUP: D84H22001380001. ”</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this manuscript, the authors used Duck.AI (model GPT-4o in an anonymous
form) in order to: Grammar and spelling check, Paraphrase and reword. After using this tool/service,
the authors reviewed and edited the content as needed and take full responsibility for the
publication’s content. As described within this work, the authors used Claude 3.7 to perform their investigation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Nguyen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. H.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <article-title>Sguard: towards fixing vulnerable smart contracts automatically</article-title>
          ,
          <source>in: 2021 IEEE Symposium on Security and Privacy (SP)</source>
          , IEEE,
          <year>2021</year>
          , pp.
          <fpage>1215</fpage>
          -
          <lpage>1229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>X.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <article-title>Characterizing ethereum upgradable smart contracts and their security implications</article-title>
          ,
          <source>in: Proceedings of the ACM Web Conference</source>
          <year>2024</year>
          ,
          <year>2024</year>
          , pp.
          <fpage>1847</fpage>
          -
          <lpage>1858</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Automatic smart contract comment generation via large language models and in-context learning</article-title>
          ,
          <source>Information and Software Technology</source>
          <volume>168</volume>
          (
          <year>2024</year>
          )
          <fpage>107405</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>B.</given-names>
            <surname>Boi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Esposito</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Smart contract vulnerability detection: The role of large language model (llm)</article-title>
          ,
          <source>ACM SIGAPP Applied Computing Review</source>
          <volume>24</volume>
          (
          <year>2024</year>
          )
          <fpage>19</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Chatterjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Ramamurthy</surname>
          </string-name>
          ,
          <article-title>Eficacy of various large language models in generating smart contracts</article-title>
          ,
          <source>in: Future of Information and Communication Conference</source>
          , Springer,
          <year>2025</year>
          , pp.
          <fpage>482</fpage>
          -
          <lpage>500</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. I.</given-names>
            <surname>Lunesu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Orrù</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <article-title>Investigation on self-admitted technical debt in open-source blockchain projects</article-title>
          ,
          <source>Future Internet</source>
          <volume>15</volume>
          (
          <year>2023</year>
          )
          <fpage>232</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Bartoletti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Benetollo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bugliesi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Crafa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Dal</surname>
          </string-name>
          <string-name>
            <surname>Sasso</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pettinau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Pinna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Piras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rossi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Salis</surname>
          </string-name>
          , et al.,
          <article-title>Smart contract languages: A comparative analysis</article-title>
          ,
          <source>Future Generation Computer Systems</source>
          <volume>164</volume>
          (
          <year>2025</year>
          )
          <fpage>107563</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Rodler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. O.</given-names>
            <surname>Karame</surname>
          </string-name>
          , L. Davi, {EVMPatch}:
          <article-title>Timely and automated patching of ethereum smart contracts</article-title>
          ,
          <source>in: 30th usenix security symposium (USENIX Security 21)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1289</fpage>
          -
          <lpage>1306</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hossain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. H.</given-names>
            <surname>Bappy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. S.</given-names>
            <surname>Zaman</surname>
          </string-name>
          , T. Islam,
          <article-title>Seam: A secure automated and maintainable smart contract upgrade framework</article-title>
          ,
          <source>arXiv preprint arXiv:2412.00680</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbàra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gatteschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schifanella</surname>
          </string-name>
          ,
          <article-title>Automatic smart contract generation through llms: When the stochastic parrot fails</article-title>
          ,
          <source>in: 6th Distributed Ledger Technology Workshop</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Barbàra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gatteschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schifanella</surname>
          </string-name>
          ,
          <article-title>Leveraging large language models for automatic smart contract generation</article-title>
          ,
          <source>in: 2024 IEEE 48th Annual Computers, Software, and Applications Conference (COMPSAC)</source>
          , IEEE,
          <year>2024</year>
          , pp.
          <fpage>701</fpage>
          -
          <lpage>710</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>R.</given-names>
            <surname>Karanjai</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <article-title>Teaching machines to code: Smart contract translation with llms</article-title>
          ,
          <source>arXiv preprint arXiv:2403.09740</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E. A.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Romani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Gatteschi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Schifanella</surname>
          </string-name>
          ,
          <article-title>Light and shadows of smart contract development with llms</article-title>
          ,
          <source>Available at SSRN</source>
          <volume>5189331</volume>
          (
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>G.</given-names>
            <surname>Baralla</surname>
          </string-name>
          , G. Ibba,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tonelli</surname>
          </string-name>
          ,
          <article-title>Assessing github copilot in solidity development: Capabilities, testing, and bug fixing</article-title>
          , IEEE Access (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Ibba</surname>
          </string-name>
          , G. Baralla, G. Destefanis,
          <article-title>Large language models for synthetic dataset generation: A case study on ethereum smart contract dos vulnerabilities (</article-title>
          <year>2025</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>