<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Fraudulent Behaviour Identification in Ethereum Blockchain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Karolis Lašas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriel ė Kasputyt ė</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ru¯ ta Užupyt ė</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tomas Krilavičius</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Baltic Institute of Advanced Technology, Department of Applied Informatics</institution>
          ,
          <addr-line>Vytautas Magnus, Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Baltic Institute of Advanced Technology, Department of Mathematics and Statistics</institution>
          ,
          <addr-line>Vytautas Magnus, Kaunas</addr-line>
          ,
          <country country="LT">Lithuania</country>
        </aff>
      </contrib-group>
      <fpage>78</fpage>
      <lpage>85</lpage>
      <abstract>
        <p>The phenomenon of cryptocurrencies continues to draw a lot of attention from investors, innovators and the general public. There are over 1300 diferent cryptocurrencies, including Bitcoin, Ethereum and Litecoin. While the scope of blockchain technology and cryptocurrencies continues to increase, identification of unethical and fraudulent behaviour still remains an open issue. The absence of regulation of the cryptocurrencies ecosystem and the lack of transparency of the transactions may lead to an increased number of fraudulent cases. In this research, we have analyzed the possibility to identify fraudulent behaviour using diferent classification techniques. Based on Etherium transactional data, we constructed a transaction network which was analyzed using a graph traversal algorithm. Data clustering was performed using three machine learning algorithms: k-means clustering, Support Vector Machine and random forest classifier. The performance of the classifiers was evaluated using a few accuracy metrics that can be calculated from confusion matrix. Research results revealed that the best performance was achieved using a random forest classification model</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Cryptocurrency</kwd>
        <kwd>Ethereum</kwd>
        <kwd>Blockchain</kwd>
        <kwd>Fraudulent Activity</kwd>
        <kwd>K-Means Clustering</kwd>
        <kwd>Support Vector Machine</kwd>
        <kwd>Random Forest Classifier</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        tocurrency by market capitalization, is the top choice
for fraudulent activity. The aim of this paper is to
anaCryptocurrencies are a viable alternative to traditional lyze the possibility to use machine learning techniques
mediums of exchange for purchasing goods or services. to identify wallets engaging in fraudulent activities in
The main idea behind such type of currency is that the Ethereum blockchain.
exchange between two parties can occur without the The rest of the paper is organized as follows. Related
involvement of a central authority. It is the network it- work in this area is presented in section 2. Section 3
inself that manages and confirms each transaction. The troduces the dataset used in the current study and the
overall history of transactions is controlled using the performed preprocessing steps. Section 4 presents the
blockchain technology, which can be described as a selected clustering techniques. Section 4.4 describes
growing list of records, that are linked together using accuracy metrics that was used to evaluate
compucryptography. Each block contains a cryptographic tational results. Experimental results are provided in
hash of the previous block, a timestamp and transac- section 5. Finally, concluding remarks and future plans
tion data. Even though blockchain technology records are discussed in Section 6.
information about each transaction, it also assures
person anonymity, as long as there is no link between
the wallet and its owner identity. Due to this reason, 2. Related Work
cryptocurrencies are more frequently used for
fraudulent activities[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. As collected by blockchain foren- Fraudulent activity identification in cryptocurrency is
sics company CipherTrace [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the increasing amount discussed in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The article aims to develop a
Superof scams led to 4.5 billion dollars in losses in 2019. vised Machine Learning based novel approach to de–
According to the blockchain monitoring company [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] anonymize the Bitcoin ecosystem and identify
crimEthereum blockchain, which is the second largest cryp- inal activities in Bitcoin blockchain. The substantial
number of Bitcoin addresses were already identified,
IVUS 2020: Information Society and University Studies, 23 April 2020, clustered and categorized by the data provider.
HowKTU Santaka Valley, Kaunas, Lithuania ever, main part of clusters were uncategorized. In
over" karolis.lasas@bpti.lt (K. Lašas); gabriele.kasputyte@bpti.lt (G. all, the dataset contains around 395 million
transacKasputytė); ruta.uzupyte@bpti.lt (R. Užupytė); tions related to 957 unique clusters.
tomas.krilavicius@bpti.lt (T. Krilavičius) The 957 observations which were labeled by the data
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g C©Co2Em02Um0oRCnospWLyircieognhrstekfAosrthttrhiobisupptiaoPpnerr4o.b0ycIniettseeardnuatihtnioorgnsa.slU((sCeCCpEBerYUm4iR.t0te)-.dWunSde.roCrrgea)tive dparotavsiedterinwcleurdeesucsaetdegfoorrietsracionminmgoannldy atesssotcsieattse.d wThitihs
illegal activities, including darknet market, mixing, ran- site – asking for private keys or fake crowdsale site.
somware, scam, stolen bitcoins, and gambling from 125 addresses were identified as malicious and later
the perspective of certain jurisdictions. The research were split into 75 for training and 50 for testing as
method consisted of three iterations using three sepa- ground truth. After taking the previously mentioned
rate datasets: the initial dataset, the dataset with over- assumption 3830 addresses were marked as malicious.
sampled minority classes, and the final, where all classes The best results was achieved using second evaluation
were over-sampled to achieve the same number of the model were SVM, Decision Tree classifier and Random
most populated class observations. Forest classifier produced the result with the same
ac
      </p>
      <p>
        Upon comparing the results of the three iterations curacy of 99.66%. Moreover, 5-fold cross-validation
the over-sampled datasets of the models were discarded. was used to prevent the models from over-fitting.
Moreover, the performance across seven algorithms: A comprehensive identification model for detection
Decision Trees, Bagging, Random Forests, Extra Trees, of phishing scams in Ethereum is discussed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In
AdaBoost, Gradient Boosting and k-Nearest Neighbors, this work, a large-scale Ethereum transaction network
was compared and the best four: Gradient Boosting, was built. Additionally, a novel network-embedding
Random Forests, Extra Trees and Bagging Classifier, algorithm called trans2vec with biases of transaction
were chosen. Finally, Gradient Boosting was selected amount and timestamp was designed to extract
feaas the most accurate algorithm with an average cross– tures from the Ethereum transaction network.
Morevalidation accuracy of 80.83%. Anomalies detection in over, on account of data imbalance and network
hetBitcoin network was analised in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], where three un- erogeneity, the one–class SVM was adopted to classify
supervised learning methods: k-means clustering, Ma- the phishing and non–phishing addresses. Finally, the
halanobis distance method and Unsupervised Support article concluded that after applying real information
Vector Machines, were applied. of Ethereum transactions, the results showed that
pro
      </p>
      <p>In this research Bitcoin transaction network were posed detection framework is efective and trans2vec is
transformed into two graphs: with nodes as users and more superior than baseline methods in terms of
feawith nodes as transactions. The dataset consists of ture extraction.
more than 6 million unlabeled users with more than To sum up, some of these articles claim to have a
37 million transactions and 30 revealed thieves in Bit- high accuracy of fraudulent behavior identification
recoin network. However, due to the long run-time, the sults, while there are few low accuracy results in other
dataset were limited to 100,000. Both Unsupervised articles. One of the article has detected that a new
alSVM method and Mahalanobis distance based method gorithm gives better results than the basic methods.
suggested similar suspicious users. In this case two The diferent types of data, its size and information
cases of theft and one case of loss out of the 30 known have caused the diferences between the results, while
cases were detected. applying the same models. In order to analyze the
ac</p>
      <p>
        The use of machine learning techniques for the iden- curacy while using our own data, we decided to use
tification of abnormal activities in the Ethereum net- 3 very popular and the most common methods:
Kwork is discussed in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. In this case, decision tree clas- Means clustering, Support Vector Machine and
Ransifier, k-nearest neighbors, Random forest, Support- dom Forest classifier.
vector Machine (SVM), Multi-layer perceptron (MLP)
and Naive Bayes algorithms were compared. Using
dataset consisting of 169,192,702 Ethereum transactions 3. Data preprocessing and
two evaluation models were analysed: features‘ extraction
1. testing on 50 originally marked malicious
ad
      </p>
      <p>dresses; 3.1. Initial data
2. testing on randomly 50 malicious addresses out A data set consists of two collections of Ethereum
transof possible 3830, under the assumption that the actions. The first collection is composed of about 420
addresses are marked as malicious, if they have fraudulent wallets identified from etherscamdb.info
dataan outgoing transaction with the malicious mark- base. A detailed information about their transactions
ed addresses was gathered from etherscan.io. The second data
colMalicious addresses are considered to be the ones which lection represents non-fraudulent activities and
conperform unauthorized or illegal actions, such as: issues sists of 53 wallets and their transactional information
fake tokens, fake admin in ICOs (Initial coin Ofering), gathered from etherscan.io database. Each data set
inscambot phishers, slackbot, fake etherscan site, fake cludes:
• transaction hash code
• sender’s address
• receiver’s address
• transaction value
• time at which transaction was made
• Ethereum block number.</p>
      <p>Transactional data was transformed into a graph, where a manual selection of clusters. Algorithm‘s inability
where x is a data point and</p>
      <p>is a  -th cluster‘s
centroid. Each centroid is calculated by averaging given
input vectors:
  =</p>
      <p>1
|  |
∑   .</p>
      <p>∈ 
The objective of a k-means algorithm is to minimize
total intra-cluster variance.</p>
      <p>Among the many disadvantages of the k-means
clustering algorithm, such as vulnerability to outliers or
inability to cluster heavily overlapping data, there is
to automatically select an optimal number of clusters
in some cases makes it the unreliable solution to data
partitioning as defining a number of clusters for
unlabeled data leaves the user with uncertainty especially
when working with large amounts of data. However,
there is no need for guessing the number of clusters
as there are a few methods that search for an optimal
number of clusters. One of them is the elbow method.</p>
      <p>
        It is one of the oldest methods for defining an optimal
number of clusters and works by calculating the sum
closest centroid [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]:
3.2. Features extraction
the nodes represent wallets and edges indicate money
transfers. Using a graph traversal algorithm, we
identify parameters representing each wallets behaviour:
• total value in ETH sent by a wallet;
• total received value in ETH by a wallet;
• a number of transactions sent by a wallet;
• a number of transactions received by a wallet
over a time period;
sending wallet;
ing wallet;
• average time between transactions to a
receiv• standard deviation of time between transactions
      </p>
      <p>performed by a sending wallet;
• standard deviation of transaction time in
seconds to receiving wallet - standard deviation of
time between transactions to a receiving wallet;
• average value in ETH sent by a wallet;
• average value in ETH received by a wallet.</p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <sec id="sec-2-1">
        <title>4.1. K-Means Clustering</title>
        <p>The first method that we considered was the k-means
techniques. Also k-means clustering may help to
determine underlying patterns of fraudulent and
nonfraudulent behaviour by grouping similar wallets’
activities. K-means clustering algorithm works by
allocating data points from given input vectors to a
predeifned number of clusters using similarity criteria,
usually Euclidean distance:
|| −   || ,
2
• average time between transactions performed by of squared distances between every data point and its
siderably small comparing to other similar clustering
clustering algorithm as its computational times are con- perplane in a  -dimensional space (where  is a
number of factors used as input for the model) and
sep
∑
 =1   ∈ 
∑ ||  −   || .</p>
        <p>2
The optimal number of clusters can be identified by
visible "elbow" on the curve (see fig. 1). The last
number before curve flattens is an optimal count of
clusters. The main drawback of this method occurs when
there is no visible "elbow" on the curve or more than
one "elbow" is visible.</p>
      </sec>
      <sec id="sec-2-2">
        <title>4.2. Support Vector Machine</title>
        <p>
          In order to find an optimal boundary between wallets
with fraudulent and non-fraudulent behaviour,
Support Vector Machine (SVM) is used. It ofers high
accuracy and requires less computational power than other
machine learning algorithms. SVM aims to find a
hyarates given data points into new classes. SVM can
be used both for regression and classification
problems [
          <xref ref-type="bibr" rid="ref10 ref9">9, 10</xref>
          ]. Consider data set consisting of m pairs
of records ( 1,  1), ( 2,  2), … , (  ,   ) as a training set,
where   ∈ R
 and
        </p>
        <p>
          ∈ {−1, 1}
these pairs, we define a hyperplane that will separate
[
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. In order to classify
them:
{ ∶  ( ) =    +  0 = 0},
where  is a unit vector (|| || = 1
plane  ( ), a rule for data classification can be written
). Using defined
hyper ( ) =
        </p>
        <p>[  +  0].</p>
        <p>For a nonlinear SVM classification, kernel method is
being used. Kernel method generates algorithms that
space. Popular kernel functions used in this method
are:
• Polynomial:
 (  ,   ) = (  ⋅   + 1) ,

where  is a degree of polynomial;
• Gaussian radial basis function (RBF):
 (  ,   ) = exp{− ||  −   || }
2
where  &gt; 0</p>
        <p>;
• Sigmoid:
 (,  ) = ℎ
(  
 +  )</p>
      </sec>
      <sec id="sec-2-3">
        <title>4.3. Random Forest Classifier</title>
        <p>Random Forest is a supervised machine learning
algorithm that can be used to solve classification or
regression problems and is more flexible with input data than
SVM, especially working with large amounts of data.</p>
        <p>It is a decision tree–based algorithm that randomly
selects various data samples and by calculating
predictions for every tree makes decisions from which it
partitions input data into new subsets. It uses averaging
maps given input data into a high-dimensional feature to improve the classification accuracy and controls the
model to avoid over–fitting. For a  -dimensional input
= ( 1,  2, … ,   ) the goal of a random forest</p>
        <p>
          ( ) for
predicting a response variable Y. The predictive function
minimizes the expected value of the loss by using a loss
function  ( ,  ( )) that usually is zero-one loss [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]:
 ( ,  ( )) =
{
0, if 
1,
        </p>
        <p>=  ( )
otherwise</p>
      </sec>
      <sec id="sec-2-4">
        <title>4.4. Accuracy evaluation</title>
        <p>
          To estimate the accuracy of the proposed models, we
use a few commonly used metrics [
          <xref ref-type="bibr" rid="ref13 ref14 ref15 ref16">13, 14, 15, 16</xref>
          ] that
can be calculated from confusion matrix also known
as contingency table (see table 1) :
• True Positive Rate:
        </p>
        <p>TPR also is known as sensitivity or recall, shows
the amount of successfully predicted class‘
values compared to all class‘ values in a data set.
• True Negative Rate (Selectivity):
  
  
=
=</p>
        <p>Kernel
F1-measure</p>
        <p>Polynomial
92</p>
        <p>Sigmoid
89</p>
        <p>GRB
93</p>
        <p>Linear
89</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>5. Results</title>
      <sec id="sec-3-1">
        <title>5.1. K-Means Clustering</title>
        <p>
          In this case, we decided to cluster the data into two
groups referring to fraudulent and non-fraudulent
wallets. We also performed an Elbow method to identify
the optimal number of clusters (fig. 1), which
conifrmed that two clusters are an optimal choice.
Using the actual data labels, we evaluated the accuracy
of the k-means algorithm. Results revealed that
overall clustering accuracy reaches 87% (see table 2).
However, while fraudulent wallets were clustered with 93%
accuracy, all non-fraudulent wallets were labeled as
frauds (table 2). A more detailed study of clustering
results was carried out using graphical analysis. For
example, figure 2 represents the relationship between the
average value in ETH sent by a wallet and the average
time between outgoing transactions. Diferent colours
TNR also known as selectivity, is the amount of Table 3
successfully predicted values for another class.
Accuracy for diferent types of kernel
• Precision (Positive Predicted Value):
 
 
 
 
 1-measure is a harmonic mean of recall and
precision [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and refers to classification accuracy.
        </p>
        <p>Here   is true possitive (successfully predicted first
class‘ values),   is true negative (successfuly
predicted second class‘ values),   is false positive (faulty
predicted second class‘ values also refered as type I
error) and   is false negative (faulty predicted first
class‘ values also refered as type II error).
represent separate clusters. By comparing clustering
results with the labelled dataset (fig. 3), we can see
that the algorithm identifies the most extreme cases
(cases with the largest values). However, the model is
unable to separate the rest of the data. Based on these
results, we can conclude that k-means clustering
provides unreliable results.</p>
      </sec>
      <sec id="sec-3-2">
        <title>5.2. Support Vector Classifier</title>
        <p>In order to achieve the best classification result, we
have performed experiments using four support
vector machine classification models:
• linear SVM;
• SVM with polynomial kernel;
• SVM with sigmoid kernel;
• the average value in ETH sent by a wallet;
• average time between outgoing transactions;
• standard deviation of time between outgoing
transactions;
• frequency of outgoing transactions.</p>
        <p>After defining the list of parameters that have the
highest influence on classification results, random forest
classification algorithm was performed. To evaluate
model‘s accuracy we used accuracy metrics discussed
in subsection 4.4. RFS model reaches 95% accuracy (see
table 5). This method predicts fraudulent wallets with
97% accuracy and non-fraudulent wallets with 67%.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>6. Conclusions</title>
      <p>• SVM with Gaussian Radial Basis (GRB) kernel. In this research, we investigated three machine
learning techniques to identify fraudulent behaviour in the</p>
      <p>Labeled data set was split into training (80 percent Ethereum blockchain data set. First of all, we
sugof data) and testing (20 percent of data) sets. The high- gested the data preprocessing framework for the
exest accuracy (93%) was achieved by using nonlinear traction of individual behaviour patterns from a
transSVM model with Gaussian Radial Basis (GRB) kernel actional dataset. Based on these patterns, the proposed
(table 4). However, although using nonlinear SVM with models were trained and compared according to
seGRB kernel 96% of fraudulent wallets were classified lected accuracy measures. Experimental results revealed
correctly, 54% of non-fraudulent wallets were classi- that the random forest classification method is the most
ifed as frauds. suitable for the identification of fraudulent behaviour.
Furthermore, the model suggests that the most
impor5.3. Random Forest Classifier tant factors for fraudulent behaviour identification are
total value in ETH sent by a wallet, the average value
After performing classification with RFC with 90 trees, in ETH sent by a wallet, the average time between
outwe extracted feature importances for model fine tun- going transactions, the standard deviation of time
being (fig. 4). Parameters with importance level higher tween outgoing transactions and the frequency of
outthan 0.1 were selected as the most important: going transactions.</p>
      <p>
        • total sent value in ETH; In the future, we are planning to improve the
proposed model‘s reliability by increasing the number of
both fraudulent and non-fraudulent wallets. Moreover, 7. Acknowledgments
we are planning to analyse the possibility to use
XGBoost method, as it was suggested to use for
identiifcation of abnormal activity in blockchain data [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
      </p>
      <p>Furthermore, we are planning to perform a statistical
significance test in order to find out whether
diferences between results are statistically significant.</p>
      <p>We thank Tadas Tamošiu¯nas, Pavel Sokolov and UAB</p>
      <p>Kevin EU 1 for cooperation and useful insights.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Baum</surname>
            ,
            <given-names>S. C.</given-names>
          </string-name>
          ,
          <article-title>Cryptocurrency fraud: A look into the frontier of fraud</article-title>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Ciphertrace</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://ciphertrace.com.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Chainalysis</surname>
          </string-name>
          ,
          <year>2020</year>
          . URL: https://www.chainalysis. com/.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>H. H.</given-names>
            <surname>Sun Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Langenheldt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harlev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. R.</given-names>
            <surname>Mukkamala</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vatrapu</surname>
          </string-name>
          ,
          <article-title>Regulating cryptocurrencies: a supervised machine learning approach to de-anonymizing the bitcoin blockchain</article-title>
          ,
          <source>Journal of Management Information Systems</source>
          <volume>36</volume>
          (
          <year>2019</year>
          )
          <fpage>37</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>T.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>Anomaly detection in bitcoin network using unsupervised learning methods</article-title>
          ,
          <source>arXiv preprint arXiv:1611.03941</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Sing</surname>
          </string-name>
          ,
          <article-title>Anomaly Detection in the Etherum Network</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Indian Institute of Technology Kanfur,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Yuan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>You</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zheng</surname>
          </string-name>
          ,
          <article-title>Who are the phishers? phishing scam detection on ethereum via network embedding</article-title>
          , arXiv preprint arXiv:
          <year>1911</year>
          .
          <volume>09259</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>T. M.</given-names>
            <surname>Kodinariya</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. R.</given-names>
            <surname>Makwana</surname>
          </string-name>
          ,
          <article-title>Review on determining number of cluster in k-means clustering</article-title>
          ,
          <source>International Journal 1</source>
          (
          <year>2013</year>
          )
          <fpage>90</fpage>
          -
          <lpage>95</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Awad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Khanna</surname>
          </string-name>
          ,
          <article-title>Support vector machines for classification</article-title>
          ,
          <source>in: Eficient Learning Machines</source>
          , Springer,
          <year>2015</year>
          , pp.
          <fpage>39</fpage>
          -
          <lpage>66</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Beritelli</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Scaglione</surname>
          </string-name>
          ,
          <article-title>Rainfall estimation based on the intensity of the received signal in a lte/4g mobile terminal by using a probabilistic neural network</article-title>
          ,
          <source>IEEE Access 6</source>
          (
          <year>2018</year>
          )
          <fpage>30865</fpage>
          -
          <lpage>30873</lpage>
          . doi:
          <volume>10</volume>
          . 1109/ACCESS.
          <year>2018</year>
          .
          <volume>2839699</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>T.</given-names>
            <surname>Hastie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Tibshirani</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Friedman,</surname>
          </string-name>
          <article-title>The elements of statistical learning: data mining, inference, and prediction</article-title>
          , Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cutler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Cutler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Stevens</surname>
          </string-name>
          ,
          <article-title>Random forests</article-title>
          ,
          <source>in: Ensemble machine learning</source>
          , Springer,
          <year>2012</year>
          , pp.
          <fpage>157</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Fawcett</surname>
          </string-name>
          ,
          <article-title>An introduction to roc analysis</article-title>
          ,
          <source>Pattern recognition letters 27</source>
          (
          <year>2006</year>
          )
          <fpage>861</fpage>
          -
          <lpage>874</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>D. M. Powers</surname>
          </string-name>
          ,
          <article-title>Evaluation: from precision, recall and f-measure to roc, informedness, markedness and correlation (</article-title>
          <year>2011</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>G.</given-names>
            <surname>Capizzi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Polap</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <article-title>Small lung nodules detection based on fuzzy-logic and probabilistic neural network with bio-inspired reinforcement learning</article-title>
          ,
          <source>IEEE Transactions on Fuzzy Systems</source>
          <volume>6</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>F.</given-names>
            <surname>Beritelli</surname>
          </string-name>
          , G. Capizzi,
          <string-name>
            <given-names>G. Lo</given-names>
            <surname>Sciuto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Napoli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Woźniak</surname>
          </string-name>
          ,
          <article-title>A novel training method to preserve generalization of rbpnn classifiers applied to ecg signals diagnosis</article-title>
          ,
          <source>Neural Networks</source>
          <volume>108</volume>
          (
          <year>2018</year>
          )
          <fpage>331</fpage>
          -
          <lpage>338</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Elkan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Naryanaswamy</surname>
          </string-name>
          ,
          <article-title>Optimal thresholding of classifiers to maximize f1 measure</article-title>
          ,
          <source>in: Joint European Conference on Machine Learning and Knowledge Discovery in Databases</source>
          , Springer,
          <year>2014</year>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>239</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ostapowicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Żbikowski</surname>
          </string-name>
          ,
          <article-title>Detecting fraudulent accounts on blockchain: A supervised approach</article-title>
          ,
          <source>in: International Conference on Web Information Systems Engineering</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>18</fpage>
          -
          <lpage>31</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>