<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contract Vulnerability Detection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Zhigang Xu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chaojun Li</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hongmu Han</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xinhua Dong</string-name>
          <email>xhdong@hbut.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Zhiqiang Zheng</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haitao Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jiaxi Zhang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xingxing Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Orest Kochan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Hubei University of Technology</institution>
          ,
          <addr-line>Wuhan,430000</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Narcotics Control Bureau of Department of Public Security of Guangdong Province</institution>
          ,
          <addr-line>Guangzhou, 510050</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Smart Contracts</institution>
          ,
          <addr-line>Vulnerability Detection, Neural Networks, Blockchain</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>As blockchain applications increase continuously, the number of smart contracts has exploded. However, an increasing number of hackers are mining the vulnerabilities hidden in smart contracts and using them to attack blockchain networks, causing serious consequences. To solve the problem of smart contract vulnerability, we propose W2V-SA, a static analysis method based on deep neural networks for vulnerability detection in smart contracts. This approach appraises the smart contracts before they are deployed to the blockchain network, to detect vulnerabilities timely. Firstly, converts smart contracts into vectors as input data by the word embedding method. Secondly, the hybrid deep neural network model is used to extract and classify features from the input data. Finally, efficient and accurate detection is achieved for six vulnerability types on real smart contracts. The experimental results indicate that the average accuracy of this method is more than 94% in smart contract vulnerability detection. The experimental results demonstrate the effectiveness of W2V-SA.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Smart contracts [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], as a core component of blockchain technology, play an important role in the
application of blockchain technology. In the early stage, smart contracts were a scripting language for
Bitcoin, mainly used to limit the input and output of transactions and implement some simple logical
judgments, and they were very difficult to write and use. With the emergence of the Ethereum
blockchain, Ethereum adopted smart contracts based on the solidity programming language, making it
easy for developers to write and use smart contracts and implement more complex functions [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. At this
stage, more and more industry sectors are introducing blockchain technology, for example, in the
logistics industry [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], blockchain technology makes the traceability of express delivery faster and more
credible. In the field of information security [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], blockchain technology can ensure that evidence is not
tampered with and is permanently preserved. Blockchain smart contracts have not only brought great
innovation to the industry but also brought great convenience to human life [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        However, blockchain has been under attack at any time since its inception. In recent years, more and
more hackers have exploited vulnerabilities in smart contracts to attack blockchain networks, causing
huge economic losses and trust crises in blockchain networks [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In 2016, hackers exploited
vulnerabilities in smart contracts, leading to attacks on DAO projects on the Ether platform and the
theft of more than $50 million in digital currency [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].Smart contracts on the CoinBene exchange were
      </p>
      <p>
        2023 Copyright for this paper by its authors.
vulnerability, leading to the theft of a large amount of digital currency [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. 2020, the UniSwap platform
smart contract vulnerability was exploited by hackers, stealing over $500,000 in digital currency
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].The security of smart contracts is closely related to the security of the blockchain network. On the
one hand, smart contracts as an important part of the blockchain ecosystem, and the security of smart
contracts directly affects the security and stability of the blockchain ecosystem. On the other hand, with
the rapid development of blockchain technology, the application scenarios of smart contracts are
expanding, and only by ensuring the security of smart contracts can they be better applied. Therefore,
it is necessary to conduct security audits on smart contracts before deploying them to blockchain
networks [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>To prevent further smart contract security incidents, researchers, scholars, and companies are
actively researching and developing more secure and reliable smart contract detection technologies and
tools to ensure the security and stability of blockchain systems. However, most smart contract
vulnerability detection methods still suffer from significant drawbacks, including low efficiency,
inability to perform high-volume smart contract vulnerability detection, low automation, and long
detection times. This paper examines the literature published by domestic and foreign scholars on smart
contract vulnerability detection, identifies the shortcomings of existing research, and proposes an
automated vulnerability detection method for (Ethereum) smart contracts named W2V-SA
(Word2vecSolidity Attention) based on the deep neural network method. By integrating the benefits of different
neural networks, W2V-SA accurately and comprehensively extracts features, surpassing single smart
contract vulnerability detection models, with higher detection efficiency and the capability of detecting
a large number of smart contract vulnerabilities. The primary contributions of this paper are as follows:
1. This paper proposes a deep neural network-based vulnerability detection method for smart
contracts (W2V-SA), which develops specific vulnerability identification checking models for
different vulnerability types of smart contracts.
2. Combining different neural network models and word embedding methods, the scheme
proposed in this paper can effectively improve the efficiency and accuracy of smart contract
vulnerability detection.
3. The results of extensive experimental comparisons indicate that the W2V-SA model, which
combines the advantages of several neural network models, surpasses individual neural network
models and machine learning algorithms in detecting smart contract vulnerabilities. The three
performance evaluation metrics- precision, recall, and F1-score, all exhibit an improvement of over
4%, signifying superior detection performance.</p>
      <p>The structure of this paper comprises five sections. The first section serves as an introduction,
presenting the background and motivation of the research. In the second section, an overview of related
work is provided. The third section outlines the proposed method and architecture. The results of the
experiment are presented in the following section. Finally, the fifth section is the conclusion and
provides an outlook for future research.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>This section covers several fundamental concepts related to the research, such as Ethereum smart
contract vulnerabilities, neural network models, and existing methods for detecting smart contract
vulnerabilities are examined.
2.1.</p>
    </sec>
    <sec id="sec-3">
      <title>Smart Contracts Vulnerability</title>
      <p>
        The smart contract is essentially a piece of code written by a human that runs automatically on a
blockchain network, which completes the given operations according to predefined rules [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
However, the code written by humans based on known knowledge is difficult to guarantee perfect
application in different scenarios, and it is also difficult to avoid the emergence of smart contract
vulnerabilities. Smart contract vulnerabilities are flaws or weaknesses in the design, implementation,
or use of smart contracts. These weaknesses can be exploited by malicious attackers and lead to
unexpected behavior in smart contracts, such as theft of funds and network crashes [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The reasons for
vulnerabilities arising from smart contracts can be summarized as follows.
      </p>
      <p> Code quality issues</p>
      <p>A smart contract is essentially a piece of code, and if the code is not of high quality, it is susceptible
to a variety of vulnerabilities. For example, there can be unvalidated input in the code, dead loops, stack
overflows, and other issues that can lead to contract execution failures or attacks.</p>
      <p> Logic vulnerabilities</p>
      <p>The writing of smart contracts often involves a lot of complex business logic, and if the writer does
not consider all possible scenarios, logic vulnerabilities may arise. For example, when writing a
multisignature contract, if the writer does not consider the case where everyone is unable to sign, it may lead
to an attack on the contract.</p>
      <p> External dependency issues</p>
      <p>Smart contracts may need to rely on external services and libraries, such as cryptographic libraries,
random number generators, etc. If these external dependencies are vulnerable, the security of the smart
contract can be compromised.
2.2.</p>
    </sec>
    <sec id="sec-4">
      <title>Smart Contracts Vulnerability Detection</title>
      <p>
        Currently, the mainstream detection methods of smart contract vulnerability can be divided into two
types, one is static analysis, and another method is dynamic analysis. Static analysis is an analysis
method of vocabulary, syntax, control flow, and data flow information before the source code is run. It
does not need to compile or run smart contract code to identify common vulnerabilities. With the rise
of artificial intelligence, a large number of methods of machine learning and deep learning have realized
static analysis. Liu et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] propose a smart contract vulnerability detection method based on a
combination of deep learning and expert patterns, which converts the code into a semantic graph to
extract deep graph features and later fuses local expert patterns for prediction. Slither [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] is a static
analysis framework that converts smart contract code to SlithIR, which is the method that can
simultaneously be a platform for finding bugs, suggesting code optimizations, and improving the
understanding of a given smart contract code. Contractward [14] converts smart contract source code
into opcodes and builds models using five machine learning algorithms and two sampling algorithms
to achieve efficient detection of six smart contract vulnerabilities. In summary, static methods often
rely on vulnerability models constructed by human experts and vulnerability-type criteria developed by
experts [15]. The high cost of manual construction and human subjectivity seriously affect the false
positive rate and omission rate of smart contracts as the complexity of the code program increases.
Static methods are generally only applicable to small-scale code programs and cannot cope with large
and complex programs and diverse vulnerabilities.
      </p>
      <p>Dynamic analysis is a method of debugging and checking the program when it running. Oyente [16]
is a well-known tool for detecting smart contract vulnerabilities based on symbolic execution. This tool
uses symbolic execution to traverse execution paths on the control flow graph to analyze vulnerabilities
in smart contracts, including error-handling exceptions, transaction order dependencies, timestamp
dependencies, and reentrancy. ConFuzzius [17] is the first tool to detect smart contract vulnerabilities
using hybrid fuzzy testing, which combines evolutionary fuzzy, lightweight EVM, symbolic taint
analysis, and genetic algorithms to generate inputs that satisfy complex conditions to achieve coverage
of more program paths and uncover deeper vulnerabilities. Manticore [18] examines the security of
smart contracts by enumeration all execution paths of the contract. The tool can detect vulnerabilities
of integer overflow and uninitialized memory. Ding et al. [19] proposed a fuzzy technology-based
approach to detect vulnerabilities in Hyperledger Fabric smart contracts. The method is well combined
with the fuzzing tool go fuzz and has achieved better results in practice. The current mainstream
dynamic analysis methods, such as symbolic execution and fuzzy testing, have great challenges. First,
fuzzy testing has difficulties in finding complex vulnerabilities with long execution paths and
concealment. Second, most programs need to judge the reasonableness of the input data, and blind
random variation will lead to inefficiency in generating test cases. Third, in the process of fuzzy testing,
different test cases may execute the same path and trigger the same vulnerabilities. The challenge with
symbolic execution is path explosion. Dynamic analysis methods require exploring all executable paths
in a smart contract and analyzing the dependency graph of a smart contract [20]. It has problems such
as high time overhead and unsuitability for high-volume contract detection, and these problems can
lead to reduced efficiency of smart contract vulnerability detection.
2.3.</p>
    </sec>
    <sec id="sec-5">
      <title>Deep Neural Network Hybrid Model</title>
      <p>In recent years, there has been a growing preference for deep neural network hybrid models. In these
models, each component compresses different feature values based on its unique characteristics,
resulting in more comprehensive extracted feature values compared to individual neural network
models. SAMF-BiLSTM [21] is a hybrid neural network model that integrates a self-attention
mechanism with a bidirectional LSTM featuring multichannel features. This model performs better in
sentiment analysis tasks. Jin et al. [22] proposed a hybrid neural network model, Bi-LSTM-CRF with
Self-Attention, for Korean named-entity recognition. Xu et al. [23] proposed a hybrid neural network
model, Self-Attention Bi-LSTM combined with ALBERT, for detecting spam.</p>
      <p>Since CNN does not require manual feature selection [24], feature extraction is fast and effective,
and has strong generalization ability [25]. Compared with other network models, the self-attention
mechanism has lower model complexity, fewer parameters, and less arithmetic power requirement, and
its computation at each step does not depend on the results of the previous step, which can be processed
in parallel as CNN [26]. In addition, the self-attention mechanism is not limited by the length of the
text. Even if the text is long, the self-attention mechanism can obtain the global and local connections
and capture the key parts of the text without losing the important information. Therefore, the paper
combines these two models with a good-performing word embedding method to build a method for
smart contract vulnerability detection. To demonstrate that the performance of the hybrid model is better
than that of the single model, the CNN and fastText models are chosen to compare with the performance
of W2V-SA. FastText [27] is a fast text classification model that is close to deep neural network
classification models in terms of precision. Moreover, no pre-trained word embeddings are required in
fastText compared to the deep neural networks model, speeding up training and testing while
maintaining high precision. The CNN modeluses [28] word embeddings for word pre-training, which
was first proposed by Kim for sentiment classification in the field of NLP, and has achieved better
results.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Overall Framework</title>
      <p>In order to achieve efficient and accurate detection of smart contract vulnerabilities, this paper
introduces a self-attention mechanism in the neural network model to improve the feature extraction
rate of the model and enhance the performance of smart contract vulnerability detection. This subsection
mainly introduces the structural design of W2V-SA and the functions of each layer of W2V-SA.
3.1.</p>
      <p>Overall Method Structure</p>
      <p>Word EMbedding Layer</p>
      <p>WE
w1 ...</p>
      <p>Cw w2 ...</p>
      <p>wn ...</p>
      <p>POS
smaLrtabcoelnetrdacts ww12 ......</p>
      <p>wm ...</p>
      <p>NEG
Unknown
smart contract
w1
w2
wt</p>
      <p>As shown in Figure 1, the W2V-SA hybrid model consists of three layers, which are the word
embedding layer, feature extraction layer, and classification layer. After the conversion of smart
contract source code into opcodes, each smart contract   takes word   as the basic unit and forms a
sequence of words: { 1, 2, 3, … … ,   }. In the word embedding layer, the word2vec model maps each
word   to a multidimensional vector    . The word embeddings in the sequence are concatenated to
obtain a word embedding matrix representation of the entire smart contract:   =  1 +  2 +  3 +
⋯ +    . After converting smart contracts into word embeddings, smart contracts are classified and
labeled into two classes: with vulnerability (positive) and without vulnerability (negative), and they are
first mixed and disrupted, and then their features are extracted using a hybrid neural network model. In
the feature extraction layer, the features extracted by the convolutional neural network are input to the
self-attention layer to calculate the attention weights, analyze the relationship between the mining
contexts, and continuously train the smart contracts with and without vulnerabilities for feature
extraction. Finally, the training results are classified and the model is saved. To verify the validity of
the model, unknown smart contracts are fed into the model for prediction to determine whether they
contain some type of vulnerability.
3.2.</p>
    </sec>
    <sec id="sec-7">
      <title>Word Embedding Layer</title>
      <p>Before word embedding training of smart contracts, the source code of smart contracts needs to be
compiled into bytecodes and then de-compiled into opcodes. According to the specification of the
Ethereum [29], each bytecode has its corresponding opcode, and there are 144 bytecodes in total
corresponding to the opcode. The mapping between bytecode and opcode is shown in Table 1.</p>
      <p>The opcode is a text message and cannot be entered directly into the model, we need to convert the
opcode into a digital representation. We choose to use the skip-gram model in the word2vec word
embedding model to vectorize the text. In this section, 144 opcodes are trained by the skip-gram model,
and after continuous testing, we choose the most suitable representation of text conversion into vectors:
each opcode is mapped into a 30-dimensional vector. Taking the opcode PUSH1 as an example, the
word vector training results are shown in Table 2.
3.3.</p>
    </sec>
    <sec id="sec-8">
      <title>Feature Extraction Layer</title>
      <p>After the skip-gram model finishes training the smart contract word embeddings, as shown in
Figure.2, the hybrid neural network model starts feature extraction of the input word embeddings.</p>
      <sec id="sec-8-1">
        <title>Word_Embedding</title>
      </sec>
      <sec id="sec-8-2">
        <title>Convld_1 (Conv1D)</title>
      </sec>
      <sec id="sec-8-3">
        <title>Convld_2 (Conv1D)</title>
      </sec>
      <sec id="sec-8-4">
        <title>Max_poolingld_2 (MaxPooling1)</title>
      </sec>
      <sec id="sec-8-5">
        <title>Convld_3 (Conv1D)</title>
      </sec>
      <sec id="sec-8-6">
        <title>Max_poolingld_3 (MaxPooling1)</title>
        <p>Max_poolingld_1 (MaxPooling1)
self_attention_1 (Self_Attention)</p>
      </sec>
      <sec id="sec-8-7">
        <title>Flatten_1 (Flatten)</title>
      </sec>
      <sec id="sec-8-8">
        <title>Dense_1 (Dense)</title>
      </sec>
      <sec id="sec-8-9">
        <title>Droput_1 (Droput)</title>
      </sec>
      <sec id="sec-8-10">
        <title>Dense_2 (Dense)</title>
        <p>In the convolutional neural network branch, As seen in Figure 2, the data output from the word
embedding layer is passed into the convolutional layer, which extracts some primary features through
local connectivity and weight sharing. Then it is passed to the pooling layer for down-sampling. The
pooling layer can effectively reduce overfitting and improve the fault tolerance of the model. This study
utilized three layers of convolutional neural networks for local feature learning. The advantage of using
three layers of convolution is to reduce the number of network parameters, thereby reducing the amount
of data that needs to be learned. As the number of layers increases, the model can extract more complex
and abstract information during training, thus improving learning efficiency.</p>
        <p>The self-attention mechanism model is another branch of the feature extraction layer, which is an
enhanced version of the attention mechanism that relies less on external information and is proficient
at capturing data features or correlations within features. By learning the correlation between different
words, the self-attention mechanism selects the more relevant words from a vast amount of information.
It enables better comprehension of the connections between local contexts and extraction of essential
feature information while reducing the data quantity.</p>
        <p>The text representation sequence obtained from the convolutional layer is input to the self-attention
layer for attention weight calculation, where the larger the weight is, the more focused its corresponding
information, and the weight represents the importance of the information. The computing process of
the self-attention mechanism is shown in Figure 3.   in the formula denotes the input vector and
  ,   ,   denotes the weight matrix. Take input vectors  1 and  2 as an example. first, multiply  1
by   ,   ,   three weight matrices respectively to get  1 (query),  1 (key),  1 (value), and input
vector  2 similarly, the calculation formula can be expressed as
  =   ∙   (1)
  =   ∙   (2)
  =   ∙   (3)</p>
        <p>Second, the relevance between input vectors is calculated.   , is represented as the relevance
between input vectors   and   . According to Equation (4), vector  1,2 is obtained by multiplying
 2 according to Equation (6).
matrix  1 and  2. Third, the attention scores between input vectors are calculated. Third, the attention
scores between the input vectors are computed. according to Equation (5),  1,2 is computed by the
activation function Soft-max, and then the input vectors  1 and  2 are obtained as the attention scores
 ′1,2. Finally, the output vector  2 is obtained by multiplying the attention scores  ′1,2 with the value</p>
        <sec id="sec-8-10-1">
          <title>Output</title>
        </sec>
        <sec id="sec-8-10-2">
          <title>Attention score relevance Input</title>
          <p>b
1
1，1
1，1
k</p>
          <p>1
a
1
</p>
          <p>1
v
b</p>
          <p>2
1，2 
1，2
k</p>
          <p>2
a
2</p>
          <p>Soft-max
b</p>
          <p>3
1，3 
1，3
k</p>
          <p>3
a
3
b</p>
          <p>4
1，4 
1，4
k</p>
          <p>4
a
4
q
1
q
2
v
2
q
3
v
3
q
4
v
4
  , =   ∙</p>
          <p>The subsequent layer after the self-attention layer is the Flatten layer, this layer helps to decrease
the dimensionality of the final output vector. The Dropout layer is positioned after the Flatten layer,
and it disregards a certain number of neurons with a certain probability in each training batch. This can
significantly reduce overfitting. The final layer is the Dense layer, which maps the feature space
computed in the previous layer to the sample label space. Its primary role is to integrate feature
representation into a single value, which reduces the influence of feature position on classification
results and enhances the entire network's robustness.</p>
          <p>The W2V-SA model, which combines the word embedding model with the hybrid deep neural
network model, is proposed in this paper. By comprehensively extracting features, W2V-SA can
effectively enhance the efficiency of smart contract vulnerability detection. In the following section,
we employ the W2V-SA model in actual smart contract vulnerability detection and establish the
effectiveness of the proposed model through numerous experiments.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>4. Experiments and Results</title>
      <p>This subsection encompasses a series of comprehensive experiments that evaluate the efficiency of
the hybrid neural network model proposed in this paper for detecting vulnerabilities in smart contracts.
4.1.</p>
    </sec>
    <sec id="sec-10">
      <title>Dataset</title>
      <p>The data utilized in this paper comprises authentic smart contracts sourced from the official
Ethereum website. After verification and removal of duplicate and incomplete smart contracts, we
acquired a total of 98,919 smart contracts. In this study, the dataset was categorized and labeled into
two groups, namely smart contracts with vulnerabilities and smart contracts without vulnerabilities. The
detection of six types of smart contract vulnerabilities was the focus of this paper, including AssertFail
vulnerability, BlockTimestamp vulnerability, CheckEffects vulnerability, LowLevelCalls
vulnerability, Reentrancy vulnerability, and IntegerUnderFlow vulnerability. Table 3 illustrates the
distribution of the smart contract dataset.</p>
      <p>Table 3 indicates that the distribution of smart contracts with and without vulnerabilities is
imbalanced, and direct usage could lead to severe overfitting. Therefore, the dataset requires balancing
before performing word embedding transformation. For imbalanced datasets, this paper utilizes a
technique called ADASYN, which is an adaptive synthetic sampling method proposed by He et al. [30].
ADASYN translates the difficulty of samples into weights to facilitate the learning process and provides
a higher proportion of the few difficult samples to simulate the data distribution. Table 4 displays the
balanced dataset, and the distribution of the balanced dataset is more reasonable than the original
dataset.</p>
      <p>The fundamental concept of the W2V-SA model is to merge word embedding with a hybrid deep
neural network model to enhance the model's performance. Figure 4 illustrates the training process of
the W2V-SA model, and all four curves of the model gradually stabilize after reaching 15 training
iterations. Throughout the training phase, the training and test sets exhibit similar curves and are in
close proximity, indicating that the model does not overfit during the training process.</p>
      <p>The W2V-SA model is evaluated for detecting common smart contract vulnerabilities, including
AssertFail vulnerability, BlockTimestamp vulnerability, CheckEffects vulnerability, LowLevelCalls
vulnerability, Reentrancy vulnerability, and IntegerUnderFlow vulnerability. The detection outcomes
are presented in Table 5.</p>
      <p>Table 5 demonstrates that the W2V-SA model presented in this paper achieves over 90% for the
three evaluation metrics, Precision, Recall, and F1-score, for the six aforementioned vulnerabilities. To
establish the superiority of the W2V-SA model, this paper compares it with the fastText and CNN
models to evaluate the performance of detecting smart contract vulnerabilities under the same
experimental environment. The results of the comparison between the W2V-SA model proposed in this
paper and fastText and CNN are depicted in Figure 5 to Figure 7.</p>
      <p>As shown in Figure 5 and Figure 6, in terms of Precision, the best detection for the six smart contract
vulnerability types is the W2V-SA model, with Precision exceeding 90%, reaching 94.85% on average.
The next best model is CNN, with an average Precision of 88.34%. The last is fastText, with a Precision
average of only 84.99%. In terms of Recall, the best detection effect is the W2V-SA model, with Recall
exceeding 90% and the average value reaching 94.79%. The next is CNN, with a recall average of
88.08%. The last is fastText, with a recall average of only 84.99%. This means that the W2V-SA model
is more accurate in detecting smart contract vulnerabilities, and is more suitable for smart contract
vulnerability detection than fastText and CNN. Among the other smart vulnerability types, the best
Reentrancy vulnerability detection is W2V-SA, with Precision and Recall metrics exceeding 97%.
Among the models compared, the CNN model performs relatively in the detection of Reentrancy
vulnerabilities, with Precision and Recall metrics exceeding 93%, and the worst result is fastText, with
Precision and Recall metrics not exceeding 90%. The performance of CNN with fastText in the
detection task of Reentrancy vulnerabilities is still lower than that of the W2V-SA model.</p>
      <p>The F1-score is a metric utilized to assess the comprehensive performance of the model, and a higher
F1-score value indicates a superior model. As illustrated in Figure 7, the average F1-score values of
W2V-SA exceed 94%, followed by CNN with an average F1-score of over 88.21%, while fastText has
the lowest average F1-score value of only 84.99%.When compared to CNN and fastText, both of which
are single-model deep neural network models and algorithms, the W2V-SA model proposed in this
paper combines the advantages of several deep neural network models. The comparison results
demonstrate that the average value of all metrics for the W2V-SA model exceeds that of the CNN model
by more than 6%, and that of the fastText model by more than 10%, under the same experimental
environment. Among all the models involved in the comparison, the F1-score of the W2V-SA model is
higher, indicating the effectiveness of the vulnerability detection method used in the model. In terms of
performance metrics, the Recall and Precision of the CNN and fastText models are inferior to those of
the W2V-SA model. As a result, the detection outcomes of the W2V-SA model can identify more
samples containing smart contract vulnerabilities.</p>
      <p>The results suggest that the W2V-SA model proposed in this paper is highly effective in detecting
smart contract vulnerabilities, with all metrics surpassing 90%. Additionally, it outperforms
singlemodel neural network models and algorithms with an exceptional level of performance.To summarize,
in contrast to previous smart contract vulnerability detection models, the W2V-SA model can update
the detection model based on different types of vulnerabilities, maintaining a high accuracy rate.
Nevertheless, there are limitations to the smart contract vulnerability detection model proposed in this
paper. Firstly, the W2V-SA model can only identify whether a smart contract contains a certain type of
vulnerability and cannot detect if a smart contract contains multiple types of vulnerabilities. Secondly,
the Precision of the W2V-SA model is higher for detecting smart contract vulnerabilities with clear
features, whereas it is lower for detecting vulnerabilities with obscure features. For example, the
Precision of the W2V-SA model in detecting integer underflow vulnerabilities is 91.86%, whereas in
detecting reentrant vulnerabilities, it is 97.71%.</p>
    </sec>
    <sec id="sec-11">
      <title>5. Conclusions</title>
      <p>The W2V-SA model proposed in this paper is a deep neural network-based smart contract
vulnerability detection model that amalgamates the advantages of various deep neural network models.
By meticulously considering contextual information during vulnerability detection, this model
accurately and rapidly extracts features. To evaluate the performance of the W2V-SA model, we
conducted comparative tests with other smart contract vulnerability detection methods under the same
experimental environment. The results showed that the W2V-SA model possesses excellent smart
contract vulnerability detection capabilities and is more efficient than other models in detecting
common smart contract vulnerabilities. Furthermore, this paper demonstrated the effectiveness of the
self-attention mechanism in the model through comparative experiments.</p>
      <p>In future work, we will focus on studying vector representation methods and feature extraction
methods for text, improving model structures, and increasing the efficiency of smart contract
vulnerability detection. The next work will delve into two aspects: (1) Using sentence vectors or other
word embedding models to improve the model structure. (2) Combining different attention mechanisms
to improve our approach. In order to cope with the constantly updated smart contract vulnerabilities
and to face the serious harm caused by smart contract vulnerabilities, we need to explore more efficient
methods for detecting smart contract vulnerabilities.</p>
    </sec>
    <sec id="sec-12">
      <title>6. Acknowledgments</title>
      <p>This work is supported by the Key-Area Research and Development Program of Guangdong
Province 2020B1111420002, the Key-Area Research and Development Program of Hubei Province
2022BAA040, the Science and Technology Project of Department of Transport of Hubei Province
2022-11-4-3, and the Innovation Fund of Hubei University of Technology BSQD2019027 、
BSQD2019020 and BSQD2016019. We deeply appreciate your consideration of our manuscript, and
we look forward to receiving comments from the reviewers.</p>
    </sec>
    <sec id="sec-13">
      <title>7. References</title>
      <p>[14] w. Wang, J. Song, G. Xu, Y. Li, H. Wang, C. Su, Contractward: automated vulnerability detection
models for ethereum smart contracts, IEEE Transactions on Network Science and Engineering 8
(2020) 1133-1144. doi:10.1109/TNSE.2020.2968505.
[15] Piantadosi V, Rosa G, Placella D, Scalabrino S, Oliveto R. Detecting functional and
securityrelated issues in smart contracts: A systematic literature review. Software: Practice and Experience
2023, 53(2): 465-495. doi: 10.1002/spe.3156
[16] L. Luu, D-H. Chu, H. Olickel, P. Saxena, A. Hobor, Making smart contracts smarter, in:
Proceedings of the 2016 ACM SIGSAC conference on computer and communications security;
2016. pp. 254-269. doi:10.1145/2976749.2978309.
[17] C. F. Torres, A. K. Iannillo, A. Gervais, R. State, ConFuzzius: a data dependency-aware hybrid
fuzzer for smart contracts, in: Proceedings of the 2021 IEEE European Symposium on Security
and Privacy (EuroS&amp;P), 2021, pp. 103-119. doi:10.1109/EuroSP51992.2021.00018.
[18] M. Mossberg, F. Manzano, E. Hennenfent, A. Groce, G. Grieco, J. Feist, T. Brunson, A. Dinaburg,
Manticore: a user-friendly symbolic execution framework for binaries and smart contracts, in:
Proceedings of the 2019 34th IEEE/ACM International Conference on Automated Software
Engineering (ASE), 2019, pp. 1186-1189. doi:10.1109/ASE.2019.00133.
[19] M. Ding, P. Li, S. Li, H. Zhang, Hfcontractfuzzer: fuzzing hyperledger fabric smart contracts for
vulnerability detection, Evaluation and Assessment in Software Engineering, 2021, pp 321-328.
doi:10.48550/arXiv.2106.11210.
[20] Ivanov N, Li C, Sun Z, Cao Z, Luo X, Yan Q. Security Threat Mitigation For Smart Contracts: A</p>
      <p>Survey. arXiv preprint arXiv:2302 07347 2023. doi: 10.48550/arXiv.2302.07347
[21] W. Li, F. Qi, M. Tang, Z. Yu, Bidirectional LSTM with self-attention mechanism and
multichannel features for sentiment classification, Neurocomputing 387 (2020) 63-77. doi:
10.1016/j.neucom.2020.01.006.
[22] G. Jin, Z. Yu, A Korean named entity recognition method using Bi-LSTM-CRF and masked
selfattention, Computer Speech &amp; Language 65 (2021) 101134. doi: 10.1016/j.csl.2020.101134.
[23] G. Xu, D. Zhou, J. Liu, Social network spam detection based on ALBERT and combination of
BiLSTM with self-attention, Security and Communication Networks 2021 (2021) 5567991. doi:
10.1155/2021/5567991.
[24] Cong S, Zhou Y. A review of convolutional neural network architectures and their optimizations.</p>
      <p>Artificial Intelligence Review 2023, 56(3): 1905-1969. doi: 10.1007/s10462-022-10213-5
[25] M.-T. Fang, K. Przystupa, Z.-J. Chen, et al, Examination of abnormal behavior detection based on
improved YOLOv3, Electronics, 10 (2021) 197. doi: 10.3390/electronics10020197.
[26] Pan X, Ge C, Lu R, Song S, Chen G, Huang Z, Huang G. On the integration of self-attention and
convolution. Proceedings of the IEEE/CVF conference on computer vision and pattern
recognition; 2022. pp. 815-825. doi: 10.48550/arXiv.2111.14556
[27] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of tricks for efficient text classification, in:
Proceedings of the 15th Conference of the European Chapter of the Association for Computational
Linguistics: Volume 2, Short Papers, 2017, pp. 427–431. doi: 10.48550/arXiv.1607.01759
[28] Y. Kim, Convolutional neural networks for sentence classification, in: Proceedings of the 2014
Conference on Empirical Methods in Natural Language Processing (EMNLP): Association for
Computational Linguistics, 2014, pp. 1746-1751. doi: 10.3115/v1/D14-1181.
[29] G. Wood, Others. Ethereum: A secure decentralised generalised transaction ledger. Ethereum
project yellow paper 151 (2014) 1-32.
[30] H. He, Y. Bai, E. A. Garcia, S. Li, ADASYN: adaptive synthetic sampling approach for imbalanced
learning, in: Proceedings of the 2008 IEEE international joint conference on neural networks (IEEE
world congress on computational intelligence), 2008, pp. 1322-1328. doi:
10.1109/IJCNN.2008.4633969.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>N.</given-names>
            <surname>Szabo</surname>
          </string-name>
          ,
          <article-title>Smart contracts: building blocks for digital markets</article-title>
          ,
          <source>EXTROPY: The Journal of Transhumanist Thought</source>
          <volume>18</volume>
          (
          <year>1996</year>
          )
          <fpage>28</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Piantadosi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Rosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Placella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Scalabrino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Oliveto</surname>
          </string-name>
          ,
          <article-title>Detecting functional and securityrelated issues in smart contracts: A systematic literature review</article-title>
          ,
          <source>Software: Practice and Experience</source>
          <volume>53</volume>
          (
          <year>2023</year>
          )
          <fpage>465</fpage>
          -
          <lpage>495</lpage>
          . doi:
          <volume>10</volume>
          .1002/spe.3156.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pournader</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seuring</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. C. L.</given-names>
            <surname>Koh</surname>
          </string-name>
          ,
          <article-title>Blockchain applications in supply chains, transport and logistics: a systematic review of the literature</article-title>
          ,
          <source>International Journal of Production Research</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>2063</fpage>
          -
          <lpage>2081</lpage>
          . doi:
          <volume>10</volume>
          .1080/00207543.
          <year>2019</year>
          .
          <volume>1650976</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Han,
          <string-name>
            <given-names>X.</given-names>
            <surname>Dong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <article-title>Blockchain-aided searchable encryption-based two-way attribute access control research</article-title>
          ,
          <source>Security and Communication Networks</source>
          <year>2022</year>
          (
          <year>2022</year>
          )
          <article-title>2410455</article-title>
          . doi:
          <volume>10</volume>
          .1155/
          <year>2022</year>
          /2410455.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Barboni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Morichetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polini</surname>
          </string-name>
          ,
          <article-title>Smart contract testing: challenges and opportunities</article-title>
          ,
          <source>in: Proceedings of the 2022 IEEE/ACM 5th International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>24</lpage>
          . doi:
          <volume>10</volume>
          .1145/3528226.3528370.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Beillahi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Keilty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Nelaturu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Veneris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <article-title>Automated auditing of price gouging TOD vulnerabilities in smart contracts</article-title>
          ,
          <source>in: Proceedings of the 2022 IEEE International Conference on Blockchain and Cryptocurrency (ICBC)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICBC54727.
          <year>2022</year>
          .
          <volume>9805509</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E. M.</given-names>
            <surname>Sifra</surname>
          </string-name>
          ,
          <article-title>Security vulnerabilities and countermeasures of smart contracts: a survey</article-title>
          ,
          <source>in: Proceedings of the 2022 IEEE International Conference on Blockchain (Blockchain)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>512</fpage>
          -
          <lpage>515</lpage>
          . doi:
          <volume>10</volume>
          .1109/Blockchain55522.
          <year>2022</year>
          .
          <volume>00080</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Kushwaha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H-N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Ethereum smart contract analysis tools: a systematic review</article-title>
          ,
          <source>IEEE Access 10</source>
          (
          <year>2022</year>
          )
          <fpage>57037</fpage>
          -
          <lpage>57062</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2022</year>
          .
          <volume>3169902</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Kushwaha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaur</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H-N.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Systematic review of security vulnerabilities in ethereum blockchain smart contract</article-title>
          ,
          <source>IEEE Access</source>
          ,
          <volume>10</volume>
          (
          <year>2022</year>
          )
          <fpage>6605</fpage>
          -
          <lpage>6621</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2021</year>
          .
          <volume>3140091</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Pise</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Patil</surname>
          </string-name>
          ,
          <article-title>A deep dive into blockchain-based smart contract-specific security vulnerabilities</article-title>
          ,
          <source>in: Proceedings of the 2022 IEEE International Conference on Blockchain and Distributed Systems Security (ICBDS)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICBDS53701.
          <year>2022</year>
          .
          <volume>9935949</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sparbrodt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>García-Valls</surname>
          </string-name>
          ,
          <article-title>Digesting smart contracts in Ethereum blockchain networks</article-title>
          ,
          <source>in: Proceedings of the 2022 5th Conference on Cloud and Internet of Things (CIoT)</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>60</fpage>
          -
          <lpage>66</lpage>
          . doi:
          <volume>10</volume>
          .1109/CIoT53061.
          <year>2022</year>
          .
          <volume>9766685</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>He</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <article-title>Smart contract vulnerability detection: from pure neural network to interpretable graph feature and expert pattern fusion</article-title>
          ,
          <source>arXiv preprint arXiv:2106 09282</source>
          <year>2021</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.2106.09282. doi: arxiv-
          <volume>2106</volume>
          .
          <fpage>09282</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>J.</given-names>
            <surname>Feist</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Grieco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Groce</surname>
          </string-name>
          ,
          <article-title>Slither: a static analysis framework for smart contracts</article-title>
          ,
          <source>in: Proceedings of the 2019 IEEE/ACM 2nd International Workshop on Emerging Trends in Software Engineering for Blockchain (WETSEB)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>8</fpage>
          -
          <lpage>15</lpage>
          . doi:
          <volume>10</volume>
          .1109/WETSEB.
          <year>2019</year>
          .
          <volume>00008</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>