=Paper=
{{Paper
|id=Vol-2956/paper60
|storemode=property
|title=Rule-based Blockchain Knowledge Graphs: Declarative AI for Solving Industrial Blockchain Challenges
|pdfUrl=https://ceur-ws.org/Vol-2956/paper60.pdf
|volume=Vol-2956
|authors=Luigi Bellomarini,Giuseppe Galano,Markus Nissl,Emanuel Sallinger
|dblpUrl=https://dblp.org/rec/conf/ruleml/BellomariniGNS21
}}
==Rule-based Blockchain Knowledge Graphs: Declarative AI for Solving Industrial Blockchain Challenges==
<pdf width="1500px">https://ceur-ws.org/Vol-2956/paper60.pdf</pdf>
<pre>
      Rule-based Blockchain Knowledge Graphs:
              Declarative AI for Solving
           Industrial Blockchain Challenges

Luigi Bellomarini1 , Giuseppe Galano1 , Markus Nissl2 , and Emanuel Sallinger2,3
                                 1
                                  Banca d’Italia, Italy
                                     2
                                   TU Wien, Austria
                              3
                                University of Oxford, UK


        Abstract. Blockchains have an incredible impact on our economic sys-
        tems. Over the last years, more and more illicit activities have taken
        place not only via the traditional financial space but also via cryptocur-
        rencies, where tracking illicit activities is more complex. Yet, several
        blockchain toolkits have been introduced that explore blockchain data.
        There is a clear industrial need to bring together many of those valuable
        methods into a principled approach that upholds central aspects such as
        the ability to provide explainable analytics.
        In this paper, we report on an international industrial and academic
        collaboration that uses a declarative, rule-based approach on top of a
        Knowledge Graph system and describes the use-case of analyzing the
        Petya ransomware.

        Keywords: Blockchain Analytics · Knowledge Graphs.


1     Introduction

Business Case and Importance. Blockchains have an incredible impact on
our economic systems with a total market capitalization of 1500 billion US Dol-
lar [8]. Over the last years, a high number of financial transactions has been
executed via blockchains, leading to increased interest in different areas, such
as analytics or fraud detection. Detecting fraud and flagging illicit activities is
one of the key areas in the financial system. This requires the ability to detect
patterns, to reason over both the traditional financial space as well as via cryp-
tocurrencies. Thus, it is of primary importance for financial institutions in the
European and world economic systems to reason over blockchain data.
Technological Challenges. While several blockchain toolkits have been in-
troduced that include different algorithms for several tasks [3, 9, 12], there is a
    The views and opinions expressed in this paper are those of the authors and do not
    necessarily reflect the official policy or position of Banca d’Italia.
    Copyright c 2021 for this paper by its authors. Use permitted under Creative Com-
    mons License Attribution 4.0 International (CC BY 4.0).
clear industrial need to bring together many of those valuable, but ad-hoc com-
bined methods into a principled approach. This principled approach must uphold
central aspects such as the ability to provide explainable analytics, so that the
derived insights are based on a solid foundation. Meeting these challenges is one
of the traditional strong points of rule-based, declarative AI.
Solution. Researchers have recognized [9, 22] that blockchains share common
features with graph-like data structures and started modelling blockchain data
as a direct acyclic graph. This is possible thanks to the structural guarantees of
the rules that govern blockchain operation, such as that one block references only
one predecessor or that each block contains a logically ordered set of transactions.
By interpreting the blockchain as a graph, special graph traversal algorithms can
be used for querying the data such as calculating the shortest path between two
nodes, e.g., accounts, or generating sub-graphs [9].
    In this paper, we introduce a connection on top of the graph by interpreting
the blockchain as a logic-based Knowledge Graph. We show that blockchains
share different research questions with the Knowledge Graph domain. One par-
ticular area of such relationship is given by detecting illicit activities such as fraud
or ransomware payments [17], which include several tasks such as identifying the
real-world entities behind users [20], clustering of blockchain identities [24] and
devising economic indicators [22].
    Apart from this general connection, this paper is motivated by the challenge:
can a rule-based, declarative Knowledge Graph be used to provide an end-to-
end analysis of the Petya ransomware, a particularly widely discussed case in
media, e.g., [1, 23]. More concretely, we use the Knowledge Graph system Vada-
log [4] that is able to handle heterogeneous data sources and allows to formulate
declarative, and full-recursive queries extending Datalog. By integrating existing
analytical frameworks in the Knowledge Graph, we are able to formulate a wide
range of use-cases in a rule-based format over the blockchain as well as the entire
analytical frameworks. This gives us the possibility to provide answers to central
questions in the financial domain within one coherent system.
Main Contributions. To summarize, our main contributions are:
 – We identify a novel relationship between blockchains and Knowledge Graphs,
   which goes beyond the graph-like structure. We show that reasoning in
   Knowledge Graphs share common tasks with typical blockchain workflows.
 – We present a rule-based approach for detecting illicit activities in block-
   chains. By using Datalog rules we provide a transparent and declarative way
   to formalize specific questions.
 – We show our approach by constructing and querying the Petya ransomware
   graph. We provide several non-trivial answers to industrial use-cases.

Organization. The remainder of this paper is organized as follows: In Section 2
we provide background information required in the following chapters. Section 3
explains the relationship between a Knowledge Graph system and blockchains
in detail. In Section 4 we provide an in-depth analysis of the Petya ransomware.
We provide related work in Section 5 and conclude the paper in Section 6.
               Block      Transaction    Output     Address       Tag


                       Fig. 1. Graph view on Bitcoin structure.


2   Preliminaries

Blockchains. A blockchain is a type of distributed ledger, where blocks are struc-
tured in a single-linked list which is linked with a hash reference to the previous
block. Each block contains a list of transactions, which contain information re-
quired to transfer coins (and data) between different accounts. An account is
created by a private and public key pair. The private key is used to spend the
coins, while the (hash of) the public key is used to receive the coins, hence called
the address of the account. In Bitcoin, each transaction can have multiple inputs
and outputs either from one or from different users. An input is used to spend
a previous output, which creates an acyclic graph of spent outputs. The differ-
ence between the inputs (i.e., output values of the previous transaction) and the
output values in one transaction are the fee paid to the miner of the block [19].
Figure 1 gives a visualization of the Bitcoin structure as a graph, summarizing
the discussed concepts. White square represents blocks, dark squares stand for
transactions, white circles are outputs (and inputs), and dark (green) circles
represent addresses. In addition, the figure contains tags of addresses, which are
an external information required for blockchain analytics.

Mixing services. Mixing services are used to enhance the privacy of the user.
While mixing services are embedded into the protocol in some blockchains, the
usage in Bitcoin requires expertise. In case of Bitcoin, mixing tries to hide the
mappings between the inputs and outputs of the transaction taking the trans-
ferred values in consideration to obfuscate which input and output address be-
longs to the same user. In simple words, a perfectly mixed transaction would
have the same value for each input and output. It turned out, that many mix-
ing attempts are not perfect. Address-reuse, same output or input addresses, or
different output values make mixing attempts more or less useless [15]. Boltz-
mann [14] is a script that analyzes the so-called linkability of a single transaction
without considering information from different transactions. The linkability be-
tween a specific input and output of a transaction is defined as the ratio between
the number of valid mappings between inputs and outputs containing the specific
input and output and possible valid mappings given the values of the transaction.
Knowledge graphs. The definitions of the term Knowledge Graph are mani-
fold [5]. In this paper, we use the definition of Bellomarini et al. [4], who de-
scribe a Knowledge Graph as a semi-structured data model characterized by three
components: (i) a ground extensional component, that is, a set of relational con-
structs for schema and data (which can be effectively modeled as graphs or gen-
eralizations thereof ); (ii) an intensional component, that is, a set of inference
rules over the constructs of the ground extensional component; (iii) a derived ex-
tensional component that can be produced as the result of the application of the
inference rules over the ground extensional component (with the so-called “rea-
soning” process). We will use this definition in Section 3 to show the fundamental
connection between Knowledge Graphs and blockchain analytics.


3   Knowledge Graphs for Blockchains

As discussed in the previous sections, the Bitcoin blockchain structure can be
interpreted as a graph-like structure for different blockchain applications. While
this connection builds the fundamental relationship to Knowledge Graphs by
mapping the Bitcoin transaction structure to the extensional component of the
Knowledge Graph, we can identify several additional properties that overlap
between blockchains and the Knowledge Graph system. All the mentioned tasks
in Section 1, such as clustering, have in common that they infer new knowledge
based on existing data, which can be seen as generating new nodes and edges
(the derived extensional component) by using the intensional component. Hence,
Knowledge Graphs and blockchain tools share additional functionalities:

 – Shared reasoning tasks. The typical behavior of tasks involving block-
   chains highly relates to Knowledge Graph reasoning. For example, identi-
   fying the real-world entities behind accounts depends on the clustering of
   several accounts into one real-world entity, which is comparable to entity
   resolution [7], i.e., decide whether two accounts are the same real-world en-
   tity, and link prediction [25], i.e., decide whether the account is part of the
   real-world entity.
 – Shared interference rules. Knowledge graph and blockchains share op-
   erationalized domain knowledge that is necessary to model domain-specific
   rules and patterns such as the representation of known money laundering or
   fraud schemes for analytical tasks.

While so far our focus has been on the abstract connection, we now concentrate
on a very concrete use-case, namely the Petya ransomware challenge. We do
want to mention that the provided description will only focus on the transaction
level, yet Knowledge Graphs are also made to perfectly model smart contracts
as part of the intensional component, i.e., the contract as rules. Smart contracts
are applications living inside the blockchains such as lottery services or supply
chain applications.
                                         Third-Party
                                         Tools


                                                       Discover


                                                   Knowledge                                   Output(...)
           Blockchain                                Graph
                               Construction                                Reasoning


    Tagging Services                            Fee Distribution                                   Boltzmann
                                                Exchange Tagging
                  Ad


                                                                                                             Li
                       dre


                                                                                                             nk
                          ss


                                                                                                              ab
                               Ta


                                                                                                                  ili
                                 gs


                                                                                                                   ty
                                              Transaction                          Output                               Linkability              CoinFlow
       Bitcoin                                  Graph                            Input Graph                              Graph
                   Construction                                    Enrichment                          Enrichment                     Analysis


                                        Fig. 2. High-Level Workflow of the Petya Analysis


4        Use Case: Tracking the Petya Ransomware
In this main section, we report how our approach performed on a concrete use-
case, namely the analysis of the Petya ransomware. This is only one example, but
representative for demonstrating how a concrete use-case can be solved using a
rule-based Knowledge Graph system. In fact, it requires to give answers to three
questions, spanning the entire life cycle of Knowledge Graphs:
 – How to construct a Knowledge Graph for analyzing the Petya ransomware
   in order to include relevant data sources. (Section 4.1)
 – How to discover new facts based on the data in the Knowledge Graph to
   get better insights. (Section 4.2)
 – How to formulate and use reasoning queries to perform blockchain analytics
   and extract relevant information. (Section 4.3)
In Figure 2 we detailed the concrete steps of the workflow to address the previ-
ously described questions. It begins with the construction by using the blockchain
and tagging services, then moves on to handling distinct discovery processes
(e.g., fee distribution, linkability) on three different graph perspectives, before
concluding with a coinflow analysis as a reasoning task. We will provide details
on each part in the respective subsections.

4.1         Petya Knowledge Graph Construction
Knowledge Graph construction has the goal of collecting and modeling relevant
data sources as the extensional component in a Knowledge Graph system. In the
case of the Petya Knowledge Graph, we use two different data sources, namely (i)
the Bitcoin blockchain itself for the basic data and (ii) tags to Bitcoin addresses
belonging to exchanges (i.e., services that swap different cryptocurrencies with
other parties) from different reliable online sources.
Bitcoin Model. There exist different views on the Bitcoin blockchain, for exam-
ple the default structure of the blockchain as defined in Section 2, or the linking
of transactions on the transaction or on the output level. Since efficient Knowl-
edge Graph Management Systems allow to convert between different views with
negligible performance overhead, we define the complete blockchain structure as
our extensional component of the KG and then convert this structure to different
views with rules for analysis during the discovery step (see Section 4.2, where
we will show graphical illustrations). The main relations are:

                            Block(blockHash, number, prevBlock)                (1)
                                         Tx(txHash, blockHash)                 (2)
          TxIn(txHash, inIndex, spentT xHash, spentT xOutIndex)                (3)
                                 TxOut(txHash, outIndex, value)                (4)

Relations 1-4 define the structure of the blockchain. Each transaction is part of
a block and has several inputs and outputs. An input references to an output,
which it consumes, while a block references to its predecessor (as described
in Section 2). We want to point out that any additional attributes, like the
address of the output, will be provided by additional relations, if required, e.g.,
TxOutAddr (txHash, outIndex, addr).
Exchange Model. For modelling exchanges we use facts in the Knowledge
Graph that mark an address as an exchange, i.e., Exchange(addr).
Petya Blockchain Source. For Petya, we extracted the Bitcoin blockchain data
only in the relevant time frame of the event, i.e., between the blocks 473.000 and
490.000. In detail, we scanned the blockchain for transactions sending money to
the ransomware address “1Mz7153h. . . ” and imported all following transactions
in the Knowledge Graph until we reached one of our three termination crite-
ria: (i) We identified an exchange by checking the output address against the
exchange model. (ii) We reached the end of our interval of interest, i.e., block
number 490.000. (iii) We exceeded a depth-limit of ten in our search graph. This
depth-limit was chosen due to the exponentially increased breadth of the trans-
action tree in the deeper levels and the reduced likelihood that the coins in the
transaction are still belonging to the Petya user.

4.2   Petya Knowledge Graph Discovery
Knowledge Graph discovery extends the Knowledge Graph by deriving the ex-
tensional component through applying rules of the intensional component. In
case of Petya, we extend the Knowledge Graph during discovery with additional
facts to gain more insight during the analysis. These facts can be either added by
our own rules or by rules interacting with other blockchain analytic frameworks.
In order to integrate additional knowledge, it is required to answer the following
three highly important questions:
 – How can we use different views of the blockchain for analytical tasks?
 – How can we use data provided by other analytical tools? We show this by
   calculating the linkability (introduced in Section 2) between in- and outputs.
 – How can we use existing knowledge to extend the Knowledge Graph with
   our own rules? We show this by extending the list of exchanges.
In order to answer these questions, we show rules that describe and highlight
the logic-based approach. These rules are given and executed in Vadalog [4].
     (a)              Tx In/Out     (b)           Output-Input   (c)            Link
                      Fee                         Fee                           Fee


    Fig. 3. Graph centered on (a) transactions (b) outputs/inputs (c) linkability


KG structure. The goal of our Petya analysis is to investigate the coin flow
between the ransomware address and possible exchanges. As briefly mentioned
in the previous section, depending on the task, different views of the blockchain
as graphs are preferable. Figure 3 visualizes three different views. In (a) we show
the transaction graph of the blockchain, which we use to calculate the fees and
can be directly extracted from the blockchain data, in (b) we show the output-
input graph with all possible edges of a single transaction, which can be inferred
from the blockchain data and in (c) we show the link graph, where outputs and
inputs are linked if there is a known flow of coins between inputs and outputs of
a transaction, which has to be estimated from the blockchain data. These three
views are complementary, since in case of coin flow analysis, we are required
to know general properties of a transaction, as shown in (a)—e.g., fees, input
and output volumes for weighing the coin flow—as well as the actual flow of the
coins. In order to capture the flow, we introduce possible “edges” (b) between
the inputs and outputs of the transaction, since this information is not present
in the blockchain. Note that the link graph (c) is an uncertain graph, as the
available information in the blockchain is only sufficient to predict such links
with a certain likelihood (e.g., via linkability discussed in the preliminaries).


Algorithm 1 Fee Distribution
                                  TxOut(t, i, v), ov = sum(v) → OutVolume(t, ov)       (1)
            TxIn(t, i, st, si), TxOut(st, si, v), iv = sum(v) → InVolume(t, iv)        (2)
           InVolume(t, iv), OutVolume(t, ov), f = iv − ov → Fee(t, f )                 (3)
                         TxIn(t, i, st, si), TxOut(st, si, v),
                 InVolume(t, iv), Fee(t, f ), if = (v/iv) ∗ f → InFee(t,i,if)          (4)


Algorithm 1 presents the concept of “fee”. Rules 1 and 2 define the total input
and output volume per transaction. Rule 3 uses these values to define the fee.
Rule 4 assigns to each input of the transaction the weighted fee according to the
ratio between the input value and the total input volume.
Algorithm 2 Mapping from Blockchain View to Output-Input (Oi) View
                 TxIn(t, i, st, si) → OiNode(st, si, t, i)                                (5)
             OiNode(ot, oi, it, ii) → ∃id OiId(id), OiOut(id, ot, oi), OiIn(id, it, ii)   (6)
 OiIn(id1, t, ii), OiOut(id2, t, oi) → PossibleEdge(t, ii, oi), PossibleOiEdge(id1, id2) (7)


Algorithm 2 presents the conversion between the blockchain view and the output
view. Rule 5 rewrites the blockchain in- and outputs identified by its transaction
and index to create a combined view of them. Rule 6 assigns each output-input
node a unique id and splits it into an output and input part. Rule 7 creates a
possible edge between in and outputs of a single transaction for the output-input
view as well as the blockchain view.
Linkability. As described, we extend the possible edges of the output-input-
view (b) of the Knowledge Graph with some metric that describes the likelihood
of an existing coin flow from an input to an output. The metric we choose here is
linkability (as discussed in Section 2). For this, we compute the linkability with
Boltzmann for each transaction, where a calculation is feasible in time. We use
the default timeout value of 600 seconds and limit the maximum to 16 input
or output values (the default is 12). Algorithm 3 shows the application of the
output of the Boltzmann computation (i.e., a value between 0 and 1 for each
address combination), where we define only edges larger than or equal to the
threshold value of the linkability score, here 100%, as a link.


Algorithm 3 Linkability-based Link View
   PossibleOiEdge(id1, id2), OiAddr(id1, a1), OiAddr(id2, a2),
                   OiIn(id1, t, ), Linkability(t, a1, a2, l), l ≥ 1.0 → Link(id1, id2)    (8)


Note that we allow to include the computation of external tools in the reasoning
process (here with accessing the Linkability fact), allowing an interaction with
highly optimized, task-specific tools. However, the usage of such tools causes a
trade-off between explainability and performance, as such computations provide
us only with the output value (i.e., a black-box algorithm). For interested readers
we provide excerpts of a non-trivial, fully explainable (white-box) approach,
expressed in a rule-based format, in the appendix.
Exchange identification. Online sources provide some, but not nearly all ad-
dresses of exchanges. Therefore, our goal is to identify additional exchanges based
on existing exchange data.
    For this, we analyze different transactions where some inputs are from ex-
changes. Consider for example the transaction “0f44a800. . . ” presented in Fig-
ure 4, which is part of our Petya graph. The figure shows that all inputs have
a 2-of-3 multisig format (i.e., two of three users have to sign the input that it
is accepted in the transaction) and that four of these inputs are tagged by a
      35WPh62DFm7wexKa8RiWnreZCq1vschzsW   0.03048738 BTC   36eMKti12qR3gYCjadyE1NxEC6Vd2uyhWj   0.06480000 BTC    3MUm3nMHWCf9gVsPGm5eNVQV8s798GQ9SW          0.19950000 BTC
        multisig 2 of 3                                       multisig 2 of 3   KRAKEN                                 multisig 2 of 3

      3L7XQXfPyjUfEF6ty1i7oDApu7VgwYCtgz   0.03204583 BTC   3HsNajN4vjZRdcLUGBLi522sjXiXg9y31c   0.03770226 BTC    32rZrzEsrSgTD1wqwKBbsorS4qf4ZkRzCr          0.07380032 BTC
        multisig 2 of 3                                       multisig 2 of 3                                          multisig 2 of 3

      3Q3gyTftkHEhL67WayUam3ByRmbMcjk6CR   0.04171262 BTC   3JoRX4YCWos3q8uQWzHJfhRdRncFnbTn7f   0.07402023 BTC    3EDAauuNWXdNKc6c66sYXbprXjbFkNVUUq          0.32733027 BTC
        multisig 2 of 3                                       multisig 2 of 3   KRAKEN                                 multisig 2 of 3

      34N5ugzr5k8Ey5haeXCLpfGcePGKcocmTH   0.03616491 BTC   3GMwc8uhgWnSs8eea7kSuEdDyhEWxZ3xvS   0.14822859 BTC    3FymNzrrXEC7D4BgtLh3t6GLF43v6YmE9Z          2.79728597 BTC
        multisig 2 of 3                                       multisig 2 of 3                                          multisig 2 of 3        KRAKEN

      392bX3VymTcdwTEKHR5LG6Q8SydWhJVT4B   0.07370000 BTC   35FXtqThxz6irYXQ4odENAGr6yyxgrq7c1   0.11487594 BTC    36YEL7swqA1TiD4mFUQkzSsttCZYaSftmP           0.58000000 BTC
        multisig 2 of 3                                       multisig 2 of 3                                          multisig 2 of 3        KRAKEN

      34sdZiSp74dWHXH3UiQQoC3HwD3j6rzyTa   0.05600099 BTC   37oXzgE7oGY9BHywoMrLeQM5w6wXX4KQEF   0.65461839 BTC    3C8J4XQ32YkzXVBJ35um2eowa6jzPiLKiU         144.53765255 BTC
        multisig 2 of 3                                       multisig 2 of 3                                          multisig 2 of 3


               Fig. 4. Transaction, where all inputs have a 2-of-3 multisig format


     (d)                          EX                                                     EX      (e)                                                        Link
                                                                                                                                                          Fee
                                                                                                                                                                                                 8
                                                                                                                  40                                                               20
                      EX                                                                                                       30                      40 Coin Flow

                                                                                                                  60                             10
                          Address
                                                                                         EX
      EX                  Existing Tag
                                                                                                                                                                                            23   Coin Flow
                                                                                                                                                   4
      EX                  New Tag                                                                                                        36            10
                                                                                                                   36
                                                                                                                                         54


                            Fig. 5. Graph centered on (d) exchange tags (e) coin flow                                                                                                   7            3
                                                                                                                                                                                            9


specific exchange which we shall refer to as “exchange X”, which is an exchange
provider for cryptocurrencies. This transaction reveals that there exist some
exchange transactions which follow a specific format. In this case, we assume
that all inputs are from “exchange X”, because they share the same input-type,
e.g., 2-of-3 multisig. Therefore, we define a generalized heuristic:
If a transaction has several inputs and one input is tagged as exchange, then all
inputs have to be tagged as exchange.
The idea is that it is unlikely that an exchange collaborates with a different
party to send transactions. This heuristic allows us to cut the reasoning space
by reducing the number of relevant nodes.
    The heuristic is visualized in Figure 5 (d), where an existing tag propagates
this information to the transaction and to the other addresses of the input nodes.


Algorithm 4 Exchange Tagging
                   TxIn(t, i, st, si), TxOutAddr(st, si, a), Exchange(a) → ExchangeTx(t)                                                                                          (9)
       TxIn(t, i, st, si), ExchangeTx(t), TxOutAddr(st, si, a), → Exchange(a)                                                                                                    (10)


Algorithm 4 presents this in rules. Rule 9 models that every transaction having
as input an address that belongs to an exchange is an exchange transaction;
Rule 10 tags each address of the input of such a transaction as exchange.
           Table 1. Statistics of the constructed Petya Knowledge Graph

            Nodes in the graph                                     102170
            Edges in the graph                                     768381
            Exchanges in the graph                                  25253
            Bitcoins received at ransomware address             4.3907926
            Bitcoins sent at ransomware address                   4.03551
            Bitcoins not sent at ransomware address              0.355282
            Transactions in the graph                                1958
            Transactions in the graph with linkability score         1514
            Transactions in the graph without linkability score       444


4.3   Petya Analysis
In this section, we provide first a general overview of some interesting statistical
properties followed by a detail analysis of the coin flow.
Statistical properties. Our goal is to get an overview of the KG, before going
into detailed tasks. For this, stakeholders asked the following questions:
 – How many coins have been received and spent by the ransomware address?
   This question indicates the number of relevant coins for analysis.
 – For how many transactions are not we able to compute the linkability? Can
   the linkability computation failure be associated with recurring patterns?
   These questions indicate the performance of the linkability computation.
Table 1 summarizes the answers to our questions. Our graph has about 100.000
nodes, thereof are about 25% exchanges, with about 770.000 edges between the
nodes. Petya collected about 4.4 Bitcoins in the chosen block-interval, whereof
about 92% have been moved and thus can be tracked in the coin flow analysis.
To answer the second question, we see in Table 1 that we were not able to
compute the linkability of 23% of the transactions. In a detailed analysis, we
analyzed these transactions to identify clusters in input and output combina-
tions. We identified that 286 transactions have 2 outputs (i.e., 64.4% of the total
transactions without linkability). We use this finding, to improve the linkability
in the graph by following “simplified” Bitcoin flow heuristic [6]:
A transaction with two outputs is a payment transaction, where one value is the
payment and one the change value and thus both outputs are linked with 100%.
We use this heuristic to extend our “links” in the following coin flow analysis.
Coin flow analysis. The goal of this analysis is to identify the coin flows of the
Petya ransomware. For this, we ask following questions:
 – How many coins have reached a known point, e.g., exchange, with 100%
   (75%, 50%) linkability? This is interesting, since exchanges (at least where
   regulated) are subject to KYC rules. In addition, exchanges are a natural
   point to stop the coin flow analysis as the ownership of the coin changes.
 – How many coins could be additional tracked with the exchange heuristic,
   how many with the linkability heuristic? This gives us feedback about the
   improvement contributed by our defined heuristics.
 – How many Bitcoins have been spent as fee or got intractable? These values
   are interesting to complete the overview of the coin flow. Fees are determined
   by the difference of the input and output of a transaction and get paid to
   a unrelated account (i.e., the account mining the block) which account is
   not further relevant for the analysis and are “lost”. Intractable coins can-
   not identify the user due to successfully mixing techniques, but are still of
   interest.

Algorithm 5 Coin Flow
                                Link(id1 , id2 ), OiVal(id2 , val) → VLink(id1 , id2 , val)   (11)
                                  VLink(id1 , , v), s = sum(v) → OutVolume(id1 , s)           (12)
                              OiDepth(id, 0), OiVal(id, vsent ) → VNode(id, vsent )           (13)
   VNode(id1 , vsent ), Link(id1 , id2 ), OutVolume(id1 , vov ),
         InFee(id1 , vf ee ), OiVal(id1 , vin ), OiVal(id2 , vout ),
  sentR = vsent /vin , totalSent = vsent − (vf ee ∗ sentR),
       receiveR = vout /vov , vrecv = totalSent ∗ receiveR → VNode(id2 , vrecv )              (14)


In order to calculate the coin flow, we use a graph traversal query, where we
follow all paths with a threshold value (e.g., 100%) of the linkability. Since some
paths contain also other inputs, the actual value transferred in an transaction
may be higher than the actual value currently flowing through this path. We
use weighted fees to approximate the fees of the input and define the following
heuristic to approximate the flow value to the output:
The flow to an output is weighted according to all relevant outputs of the input.
An output is relevant, if the linkability is higher or equal as the threshold.
Consider Figure 5 (e). We have three inputs i1 of 60, i2 of 30 and i3 of 10
coins, two outputs, o1 of 36 and o2 of 54 coins, a linkability of i1 with 0.0 for
o1 and 1.0 for o2 and a input flow of i1 of 40. Then the fee is 10 coins, whereof
input i1 pays 6 coins, i2 3 coins, and i3 1 coin. The input flow is 40/60 = 2/3
and hence the fee of the input flow of i1 is 4 coins. The total amount of relevant
output is 54, and the weighted output of o1 is 54/54 = 1.0. Then the flow from
i1 to o1 is (40 − 4) ∗ 1.0 = 36 coins.
Algorithm 5 presents the coin-flow computation. Rule 11 annotates each link
with the value of the output, which is used in Rule 12 to calculate the output
volume of all valid outputs. Rule 13 initializes the nodes at depth 0 (i.e., transac-
tions to the ransomware address) with the transferred value. Rule 14 propagates
this value recursively to the following nodes according to the defined heuristics.
    Table 2 provides an overview of the results. We see that 96.7% of the value
is tractable (89% to an exchange and 7.7% as fee), which leaves only 3.3% of
the coins intractable. While both heuristics improved the result, we see that the
exchange heuristic has a much higher impact on the result and increased the
coins, which flow to exchanges, from 84.3% up to 88.9%.
Even without the two heuristic we could track more than 84% of the coins to
exchanges. This can be explained by an exchange at depth 3, which was used
                 497b1a97...[0]     3917231    9d0b940a...[0]   3917197     bcc3d443...[0]


                                               a2a6c377...[1]   43478905    4f7ed4f9...[0]
                                   43509166
                 6408a062...[0]    339588561
                                                                 1899088
                                               a2a6c377...[0]               b5378d27...[0]
                                                                337645014

                                                                            b5378d27...[1]


                Fig. 6. Fund movements to the high value exchange.

                                  Table 2. Coin Flow Statistics

           Bitcoin sent to exchanges                             3.5928
           Bitcoin spent as fee                                  0.308909
           Bitcoin intractable                                   0.133801
           Bitcoin sent to exchanges (linkability 50%)           3.5928
           Bitcoin sent to exchanges (w/o exchange heuristic) 3.40271
           Bitcoin sent to exchanges (w/o linkability heuristic) 3.586


to cash out 83.7% of all the coins. Figure 6 visualizes this fund movement by
showing the movements after the first consolidations at the lowest depth until
the discussed exchange.
We would like to point out that the results are derived using mathematical rela-
tionships of Bitcoin transactions and heuristics given by domain experts, so we
can guarantee accuracy only with respect to these heuristics. The main goal of
this research is to demonstrate how rule-based systems can be used for blockchain
analytics, with an emphasis on their ease of use for analytical reasoning tasks
and seamless integration with sophisticated algorithms.


5   Related Work

The related work in the area of blockchain analytics is manifold, as multiple
problems have been studied in detail such as clustering of blockchain identi-
ties [24], de-anonymizing users [20], computation of economic indicators [22], for
example to predict future Bitcoin prices [2] or transaction fees, or the identifica-
tion of criminal activities like ransomware payments [17] or financial fraud [18].
We focus in the following on well-established tools and provide an overview on
related ransomware analytics.
    Bartoletti et al. [3] created a general Scala library for blockchain analytics
focusing on common tasks such as address clustering or user deanonymization to
generate a shared view between them. BlockSci [12] is an in-memory database
optimized for blockchain analytics for Bitcoin-like blockchains that provides ten
pre-defined heuristics for linking addresses and an interface for map-reduce jobs.
GraphSense [9, 10] builds upon BlockSci and provides an analytics platform on
a NoSQL-basis working on higher-level graph structures such as on the address
(i.e., all transactions between two addresses) or entity (i.e., a clustering of ad-
dresses) level and supports the collaborative collection of address tags. Typical
clustering heuristics include the multiple-input heuristics which assumes that
inputs spent in the same transactions are controlled by the same user (which
does not work for mixing transactions where different users are contributing to
the input), or the change address heuristic which links the second (lower) output
to the input addresses [26]. While those systems focus on a scalable and efficient
analytic platform by providing tools, the focus of this work is to study on a
concrete scenario the ease of use of rule-based systems for blockchain analytics.
    Multiple studies have investigated ransomware activity in the Bitcoin net-
work. One main direction of the work is the detection of addresses related to
ransomware [16, 21]. One ransomware studied is CryptoLocker [13, 17], where
similar patterns of address usage were detected and a tracking of the coins to
a dark web marketplace via an exchange have been reported. Huang et al. [11]
estimated a lower bound of ransomware revenue of about $16 millions USD over
two years in 2018 and studied the movement of ransomware payments in the Bit-
coin network. In comparison, our work focuses on the rule aspect for blockchain
analytics and not on new techniques regarding ransomware detection and track-
ing. Our proposal of adopting Knowledge Graphs can be seen as complementary
to existing, highly optimized, task-specific algorithms, bridging the gap between
implicitly gained explainability via rules and highly optimized tasks such as clus-
tering algorithms, which can perfectly co-exist (as mentioned in Section 4.2). To
the best of our knowledge, rule-based Knowledge Graphs for blockchain analytics
have not been considered so far.

6   Conclusion
In this paper, we showed how the key industrial challenge of tracking illicit
activities in blockchains can be solved via a rule-based, declarative AI approach.
We discussed our methodology for the entire life cycle of the Knowledge Graph,
including different views on the blockchain data, integration of third-party tools
and a analytical scenario.
Importance and impact. It is a key mission for the financial system to prevent
being exploited for illicit activities. For this reason, being able to detect and
highlight the risks in new systems, such as blockchains, is crucial to its correct
operations. The approach showed impact in a variety of aspects:
 – Simple integration of different tools to use in analysis
 – Ease of trying out new ideas and using explainable, rule-based formulation
   for interaction between stakeholders.
 – Low barrier of entry, as only high-level technical knowledge is required, and
   more people can use and understand the platform.
Our approach allows fully recursive blockchain queries in a declarative way, which
increases the maintainability of blockchain analytics scripts. In future work, we
want to improve our system by considering specific performance optimizations
for acyclic blockchain graphs, extend the Knowledge Graph to the complete
Bitcoin blockchain and develop or use more sophisticated heuristics to improve
the overall result.
Acknowledgements
The financial support by the Vienna Science and Technology Fund (WWTF)
grant VRG18-013 is gratefully acknowledged.


References
 1. Abrams, L.: Petya ransomware’s encryption defeated and password generator
    released. https://www.bleepingcomputer.com/news/security/petya-ransomwares-
    encryption-defeated-and-password-generator-released/ (2016), [Online; accessed
    2021-07-29]
 2. Akcora, C.G., Dey, A.K., Gel, Y.R., Kantarcioglu, M.: Forecasting bitcoin price
    with graph chainlets. In: PAKDD (2018)
 3. Bartoletti, M., Lande, S., Pompianu, L., Bracciali, A.: A general framework for
    blockchain analytics. In: SERIAL@Middleware (2017)
 4. Bellomarini, L., Fakhoury, D., Gottlob, G., Sallinger, E.: Knowledge graphs and
    enterprise AI: the promise of an enabling technology. In: ICDE (2019)
 5. Bergman,      M.K.:    A     common       sense   view    of     knowledge      graphs.
    https://www.mkbergman.com/2244/a-common-sense-view-of-knowledge-graphs/
    (2019), [Online; accessed 2020-04-19]
 6. bitcoin.it: Change. https://en.bitcoin.it/wiki/Change, [Online; accessed 2020-04-
    19]
 7. Christen, P.: Data Matching - Concepts and Techniques for Record Linkage, Entity
    Resolution, and Duplicate Detection. Springer (2012)
 8. CoinMarketCap:             Total         cryptocurrency            market          cap.
    https://coinmarketcap.com/charts/ (2021), [Online; accessed 2021-07-29]
 9. Haslhofer, B., Karl, R., Filtz, E.: O bitcoin where art thou? insight into large-scale
    transaction graphs. In: SEMANTiCS (2016)
10. Haslhofer, B., Stütz, R., Romiti, M., King, R.: Graphsense: A general-purpose
    cryptoasset analytics platform. CoRR abs/2102.13613 (2021)
11. Huang, D.Y., Aliapoulios, M.M., Li, V.G., Invernizzi, L., Bursztein, E., McRoberts,
    K., Levin, J., Levchenko, K., Snoeren, A.C., McCoy, D.: Tracking ransomware
    end-to-end. In: IEEE Symposium on Security and Privacy. pp. 618–631. IEEE
    Computer Society (2018)
12. Kalodner, H.A., Möser, M., Lee, K., Goldfeder, S., Plattner, M., Chator, A.,
    Narayanan, A.: Blocksci: Design and applications of a blockchain analysis platform.
    In: USENIX Security Symposium. pp. 2721–2738. USENIX Association (2020)
13. Kharraz, A., Robertson, W.K., Balzarotti, D., Bilge, L., Kirda, E.: Cutting the
    gordian knot: A look under the hood of ransomware attacks. In: DIMVA. Lecture
    Notes in Computer Science, vol. 9148, pp. 3–24. Springer (2015)
14. LaurentMT: A standard interface for tokens. https://github.com/Samourai-
    Wallet/boltzmann, [Online; accessed 2020-04-19]
15. LaurentMT:         Bitcoin       transactions      &      privacy        (part      1).
    https://gist.github.com/LaurentMT/e758767ca4038ac40aaf              (2017),    [Online;
    accessed 2020-04-19]
16. Li, Y., Cai, Y., Tian, H., Xue, G., Zheng, Z.: Identifying illicit addresses in bitcoin
    network. In: BlockSys. Communications in Computer and Information Science,
    vol. 1267, pp. 99–111. Springer (2020)
17. Liao, K., Zhao, Z., Doupé, A., Ahn, G.: Behind closed doors: measurement and
    analysis of cryptolocker ransoms in bitcoin. In: eCrime (2016)
18. Möser, M., Böhme, R., Breuker, D.: Towards risk scoring of bitcoin transactions.
    In: Financial Cryptography Workshops (2014)
19. Nakamoto,      S.:    Bitcoin:   A    peer-to-peer    electronic   cash     system.
    http://bitcoin.org/bitcoin.pdf (2008), [Online; accessed 2019-11-19]
20. Ober, M., Katzenbeisser, S., Hamacher, K.: Structure and anonymity of the bitcoin
    transaction graph. Future Internet (2013)
21. Paquet-Clouston, M., Haslhofer, B., Dupont, B.: Ransomware payments in the
    bitcoin ecosystem. J. Cybersecur. 5(1), tyz003 (2019)
22. Ron, D., Shamir, A.: Quantitative analysis of the full bitcoin transaction graph.
    In: Financial Cryptography (2013)
23. Solon, O., Hern, A.: Petya’ransomware attack: what is it and how can it be stopped.
    The Guardian (2017)
24. Spagnuolo, M., Maggi, F., Zanero, S.: Bitiodine: Extracting intelligence from the
    bitcoin network. In: Financial Cryptography (2014)
25. Taskar, B., Wong, M.F., Abbeel, P., Koller, D.: Link prediction in relational data.
    In: NIPS (2003)
26. Zhang, Y., Wang, J., Luo, J.: Heuristic-based address clustering in bitcoin. IEEE
    Access 8, 210582–210591 (2020)


A        Rule-based Linkability Computation

Assume, we already have created a list of valid combinations by starting with
the parent of all inputs linked to all outputs, followed recursively by all non-
intersecting pairs of valid input-output sets (i.e., a set is valid if the difference
between input volume and output volume is greater or equal than zero and
smaller or equal than the fee), which union of input sets and output sets match
the parent. Such a list is presented in Table 3 for four inputs with the values
(30, 6, 3, 3) and two outputs with the values (10, 15) with the according max
depth of each child.


                             Table 3. Example of Linkability Calculation

       Left Child           Right Child      Depth Map. Use     Left Child     Right Child      Depth Map. Use

  {}         {}      {i0,i1,i2,i3} {o0,o1}       0   16   1 {i1,i2,i3} {o0}   {i0}   {o1}           1    1   1

  {i3}       {}      {i0,i1,i2}    {o0,o1}       1    5   1 {i2}      {}      {i0,i1} {o0,o1}       2    2   1

  {i0,i3}    {o0,o1} {i1,i2}       {}            1    2   1 {i0,i2}   {o1,o2} {i1}   {}             2    1   1

  {i1,i3}    {}      {i0,i2}       {o0,o1}       1    2   1 {i1,i2}   {}      {i0}   {o0,o1}        2    1   1

  {i2,i3}    {}      {i0,i1}       {o0,o1}       1    2   1 {i2}      {}      {i1}   {}             2    1   1

  {i0,i1,i3} {o0,01} {i2}          {}            1    1   1 {i2}      {}      {i0}   {o0,o1}        2    1   1

  {i0,i2,i3} {o0,o1} {i1}          {}            1    1   1 {i1}      {}      {i0}   {o0,o1}        3    1   2

  {i1,i2,i3} {}      {i0}          {o0,o1}       1    1   1


For calculating the linkability, we have to calculate the number of valid mappings
and the actual number of usages of each (partial) mapping. This calculation is
shown in Figure 7, where (a) counts the number of mappings and (b) counts
the number of usages. These numbers are used to calculate the final linkability
between each input and output value by counting each left child, where the input
                                            i0,i1,i2   2            i0,i1                                                        i0,i1,i2   1           i0,i1
                                   i3                                                                                   i3
 (a)                        5                o0,o1
                                                            i2
                                                                    o0,o1         (b)                             1               o0,o1
                                                                                                                                                 i2
                                                                                                                                                        o0,o1

                                                       1                                                                                    1
                                 i0,i3                     i0,i2                                                      i0,i3                     i0,i2
                                             i1,i2                   i1                                                           i1,i2                  i1
                                 o0,o1                     o0,o1                                                      o0,o1                     o0,o1
                            2                                                                                     1
                                                       1                                                                                    1
                                            i0,i2                    i0                                                          i0,i2                   i0
                            2     i1,i3                    i1,i2                                                  1    i1,i3                    i1,i2
                                            o0,o1                   o0,o1                                                        o0,o1                  o0,o1
                                                       1                                                                                    1

                            1    i0,i1,i3                                                                         1   i0,i1,i3
                                              i2            i2       i1                                                            i2            i2      i1
                                  o0,01                                                                                o0,01

                                                       1                                                                                    1
 16       I0,I1,I2,I3                                                                 1 I0,I1,I2,I3 I0,I1,I2,I3
                            1    i0,i2,i3                            i0                                           1   i0,i2,i3                           i0
            o0,o1                             i1            i2                              o0,o1      o0,o1                       i1            i2
                                  o0,01                             o0,o1                                              o0,01                            o0,o1

                            1                                                                                     1
                                             i0                                                                                   i0
                                 i1,i2,i3                                                                             i1,i2,i3
                                            o0,o1                                                                                o0,o1
                                                                            1            i0                                                                     2         i0
                                                                                 i1                                                                                 i1
                            1                                                           o0,o1                     1                                                      o0,o1

                                 i1,i2,i3     i0                                                                      i1,i2,i3     i0
                                    o0        o1                                                                         o0        o1


                            2               i0,i1                                                                 1              i0,i1
                                  i2,i3                                                                                i2,i3
                                            o0,o1                                                                                o0,o1


                                          Fig. 7. Calculation of (a) Mappings and (b) Usages


and output value is present, used ∗ mappings times and each right child, where
the values are present, used times. The multiplication of the left child considers
the number of sub-mappings provided by the right child.
Algorithm 6 defines the number of mappings (a). Rule 1 defines for each element,
whether it has a child. Rule 2 uses this information to set the number of mappings
to one for all elements that have no children. Rule 3 recursively defines the
number of mappings by the sum of the child mappings plus its own mapping.


Algorithm 6 Linkability: Calculation of Mappings
                          List(il , ol , ir , or , ), List(cil , col , cir , cor , ),
                                                                   ir = cil |cir , or = col |cor → HasChild(il , ol , ir , or )                                                  (1)
                        List(il , ol , ir , or , ), notHasChild(il , ol , ir , or ) → Mapping(il , ol , ir , or , 1)                                                             (2)
           List(il , ol , ir , or , ), Mapping(cil , col , cir , cor , v),
                                                                   ir = cil |cir , or = col |cor ,
                                  c = msum(v, < cil , col , cir , cor >) + 1 → Mapping(il , ol , ir , or , c)                                                                    (3)


Algorithm 7 presents the calculation of the number of usages for each mapping
(b). For this, we init in Rule 4 the root mapping with 1 and define in Rule 5
recursively the number of usages as the sum of its parents.


Algorithm 7 Linkability: Calculation of Usage per (partial) Mapping
                                                                                List(il , ol , ir , or , 0) → Usage(il , ol , ir , or , 1)                                       (4)
       List(il , ol , ir , or , ), Usage(ilp , olp , ip , op , v), ip = il |ir ,
                                op = ol |or s = msum(v, < ilp , olp , ip , op >) → Usage(il , ol , ir , or , s)                                                                  (5)

</pre>