<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>The Hubs and Authorities Transaction Network Analysis using the SANSA framework</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Danning Sui</string-name>
          <email>danning.sui@consensys.net</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gezim Sejdiu</string-name>
          <email>sejdiu@cs.uni-bonn.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Damien Graux</string-name>
          <email>damien.graux@iais.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jens Lehmann</string-name>
          <email>jens.lehmann@cs.uni-bonn.de</email>
          <email>jens.lehmann@iais.fraunhofer.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fraunhofer Institute for Intelligent Analysis and Information Systems</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Smart Data Analytics, University of Bonn</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>With the recent trend on blockchain, many users want to know more about the important players of the chain. In this study, we investigate and analyze the Ethereum blockchain network in order to identify the major entities across the transaction network. By leveraging the rich data available through Alethio's platform in the form of RDF triples we learn about the Hubs and Authorities of the Ethereum transaction network. Alethio uses SANSA for e cient reading and processing of such large-scale RDF data (transactions on Ethereum blockchain) in order to perform analytics e.g. nding top accounts, or typical behavior patterns of exchanges' deposit wallets and more.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Data visualization using the
Databricks notebooks or SANSA
notebooks
EthOn RDF
triples
Amazon S3
buckets
Top Accounts, Hubs &amp; Authorities, Wallet
Exchange behavior</p>
      <p>Querying
Conne(SctPedARCoQmLp)onents PageRank</p>
      <p>Connected</p>
      <p>Components
Data partition
Data ingestion</p>
      <p>Hubs &amp;
Authorities
entities</p>
      <p>In this paper, we perform an analysis (using well-known graph processing
algorithms) of the value transaction network graph with the main focus on the
Hubs and Authorities behaviors. \Authorities" are accounts who pay out to a
large crowd of addresses, with high volume; while \Hubs" are entities who receive
extensive Ether (ETH) ow into their accounts. In this study, we do not di
erentiate these two roles but rank them all together as the biggest players/entities.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Finding big Ethereum players with SANSA</title>
      <p>The Ethereum network graph contains nodes of external accounts which have
had a transaction on the Ethereum blockchain. The connection (edges) between
such nodes on the network indicate the transaction relationship between them;
when a node (an external account) sends ETH to another, a transaction record is
written, and an edge between them is added in the network with the direction of
the ETH ow. When we encounter multiple edges between same pairs of nodes,
we summarize the edges as a single one3. The edge weight is the total transaction
value in Ether. As an example, if address A sends x ETH to address B in total,
there will be an edge of weight x from node A to node B. In this study, self-loops
i.e. transactions from an address to itself are omitted.</p>
      <p>
        SANSA framework has been used for e cient reading and querying of RDF
datasets using SPARQL as depicted on Figure 1. First, the data need to be
loaded on an e cient storage that SANSA can read from. For that purpose, we
3 This optimization is also convenient practically as it is easier not to have duplicated
edges in a graph.
use Amazon S3 buckets containing the whole RDF Ethereum network
transactions. Afterword, SANSA data representation layer loads the data in a form of
Resilient Distributed Datasets (RDD) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] of triples. During this process, SANSA
performs a data partition for fast processing and then aggregate and lter the
data using the its query layer [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Further, we applied two classic graph analysis
algorithms via Apache GraphX: Connected Components and Page Rank.
Connected Components algorithm enables us to nd the largest cluster of connected
nodes, regardless of transaction direction. Within this largest cluster, we can
derive the page rank score of all nodes. Top-ranked entities and their relation
are visualized.
      </p>
    </sec>
    <sec id="sec-3">
      <title>Results</title>
      <sec id="sec-3-1">
        <title>Datasets</title>
        <p>The Ethereum dataset in the format of RDF contains more than 17B triples. For
the sake of the experiment, we limited the dataset to 10,000 blocks which contain
around 38M triples, including both value transactions and contract messages.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Top Accounts Analysis</title>
        <p>The PageRank algorithm was run over the largest connected component of
185,741 nodes (accounts) and 250,637 edges (aggregated transaction relations).
Figure 2 plots the top 50 account's distribution. Based on the ndings, we can
see that these accounts are grouped on two di erent types: mining pool wallets,
and (mostly centralized) exchange wallets.</p>
        <p>Figure 3 shows that 58% of the addresses are controlled by exchanges, while
another 12% with convincing tags related to the mining pools. The exchange
and mining pool wallets can be found in the top position of our ranking,
underlining the e ectiveness of PageRank: Addresses related to mining pools allocate
extensive amounts of payouts to their subscribed miners, resulting in large
outdegrees, as well as high accumulated transaction value. We can see that the main
wallets are centralized exchanges which distribute (and receive) large volumes
of the transaction to (and from) their deposit wallets, token contracts, etc.</p>
        <p>Our PageRank implementation successfully detects the most in uential
accounts across the network, corresponding to the Hubs and Authorities,
connecting various transactors and carrying heavy ow weights.</p>
        <p>Focusing on those known accounts (with labels from Etherscan4), we present
(see Figure 4) the network overview of top hubs and authorities with transactions
as edges surrounding them.
3.3</p>
        <p>Typical Behavior Patterns of Exchanges' Deposit Wallets
We investigated the associated transaction behavior of the exchange wallets.
Based on our nding, these behaviors can be grouped into three categories:
1. Frequently paying out to certain exchanges' main wallets with a xed, large
value { From the scatter plot, the payout amount is always around a same
value.
2. Frequently receiving funds from the same exchange main wallets, and paying
out to various token contracts { This is due to the activity which is
associ4 https://etherscan.io/
ated with exchanges as they use external accounts as deposit addresses for
collecting tokens based on trading needs.
3. Frequently receiving funds from a group of \miner" accounts, with \proxy"
accounts in between, which clean out their received ETH within a short time
frame { Usually, these addresses receive funds from miner accounts, which
again get paid reasonable amounts by known mining pools, which we assume
are mining rewards (usually around 0.11-0.12 ETH).</p>
        <p>Despite pointing out the three typical behaviors above, they are not
necessarily mutually exclusive. There are addresses which share more than one of
the deducted patterns. These behavior patterns explored here are based on the
labels we have gathered, and this may be di erent for other use cases.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>SANSA provides a scalable solution for reading and querying large scale RDF
data, providing compatibility with machine learning libraries on Spark including
GraphX as a graph processing library. With conventional graph analysis tools,
we successfully identi ed Hubs and Authorities in the Ethereum transaction
network and discovered that they are mainly related to exchange wallet and
mining pool activities.</p>
      <p>This pipeline also provides a possibility to lter out top accounts, which are
likely to be exchanges' deposit wallets. Furthermore, with the ltered top rank
accounts, the \mixing" patterns of exchanges' deposit wallets become
recognizable. This can be a promising tool for detecting previously unknown exchange
wallets and lead to a deeper understanding of their behavior patterns for future
analyses.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sejdiu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Buhmann, L.,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petzka</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngonga</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jabeen</surname>
          </string-name>
          , H.:
          <article-title>The Tale of Sansa Spark</article-title>
          . In: 16th International Semantic Web Conference, Poster &amp;
          <string-name>
            <surname>Demos</surname>
          </string-name>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sejdiu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , Buhmann, L.,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakraborty</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Ngonga</given-names>
            <surname>Ngomo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.C.</given-names>
            ,
            <surname>Jabeen</surname>
          </string-name>
          , H.:
          <article-title>Distributed semantic analytics using the SANSA stack</article-title>
          .
          <source>In: ISWC Resources Track</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pfe</surname>
            <given-names>er</given-names>
          </string-name>
          , J.,
          <string-name>
            <surname>Beregszazi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Detrio</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Junge</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chow</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oancea</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrzak</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Khatchadourian</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bertolo</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Ethon - An Ethereum ontology (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Wood</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Ethereum: A secure decentralised generalised transaction ledger</article-title>
          .
          <source>Ethereum project yellow paper 151</source>
          , 1{
          <fpage>32</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Zaharia</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chowdhury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dave</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ma</surname>
          </string-name>
          , J.,
          <string-name>
            <surname>McCauley</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Franklin</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shenker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stoica</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Resilient distributed datasets: A fault-tolerant abstraction for in-memory cluster computing</article-title>
          .
          <source>In: Proceedings of the 9th USENIX conference on Networked Systems Design and Implementation</source>
          . pp.
          <volume>2</volume>
          {
          <issue>2</issue>
          .
          <string-name>
            <given-names>USENIX</given-names>
            <surname>Association</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>