<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>BLF: A Blockchain Logging Framework for Mining Blockchain Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Paul Beck</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hendrik Bockrath</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tom Knoche</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mykola Digtiar</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Petrich</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniil Romanchenko</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Richard Hobeck</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Luise Pufahl</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Klinkmüller</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ingo Weber</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CSIRO Data61</institution>
          ,
          <addr-line>Sydney</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Software &amp; Business Engineering</institution>
          ,
          <addr-line>Technische Universitaet Berlin, Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Technische Universitaet Berlin</institution>
          ,
          <addr-line>Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Blockchain technology is increasingly used to realize decentralized applications and execute crossorganizational processes. Understanding how an application is used and how partners and users participate is essential to avoid failures and plan improvements. This understanding can be built by analyzing logs; but although data is in principle given in the immutable ledger, log extraction is currently still inconvenient, slow, and subject to interpretation. In this demo, we present BLF, an extensible logging framework for decentralized applications deployed on a blockchain. The framework is realized for Ethereum and Hyperledger, and has been tested for applications on those networks, but is extensible for other blockchains. Practitioners can use it to analyze their blockchain application and BPM researchers can explore with it new types of event data - event logs from blockchain applications.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Logging</kwd>
        <kwd>Blockchain Application</kwd>
        <kwd>Process Mining</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Blockchain technology enables a new generation of applications, commonly referred to as
decentralized applications (DApp), which can e.g., support the execution of cross-organizational
business processes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Whereas DApp developers have full control over their DApp’s features,
the shared nature of the networks on which the DApps are deployed limits the developers
influence on when, where, and under what circumstances they are executed. Thus, the analysis
of DApp behavior based on logs is essential for avoiding failures and planning improvements
for the future. Here, process mining techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] support the analysis of events over time and
can provide useful insights as e.g., shown in [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ].
      </p>
      <p>In this demo, we present the Blockchain Logging Framework (BLF) whose main components
are summarized in Fig. 1. At heart, BLF enables the generation of logs from DApp data by
allowing users to define an extract-transform-load (ETL) pipeline. To this end, users have to
Blockchain
Transactions
Transaction</p>
      <p>Receipts</p>
      <p>State
emit
execute</p>
      <p>DApp
Source Code</p>
      <p>Blockchain Logging Framework</p>
      <p>(BLF)</p>
      <p>Extractor
Manifest
(BcQL)</p>
      <p>Validator</p>
      <p>Output</p>
      <p>
        CSV
TXT
XES
specify a manifest using BLF’s Blockchain Query Language (BcQL). To support and ease the
definition of the manifests, BLF’s validator can verify their correctness and inform users about
potential specification errors. The framework itself is an extension of the Ethereum Logging
Framework [
        <xref ref-type="bibr" rid="ref4 ref5">4, 5</xref>
        ] (ELF) which was developed and tested for Ethereum [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. BLF by contrast is
designed as a generic framework for generating logs from DApps on any blockchain platform.
While BLF currently provides adapters for Ethereum and Hyperledger [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], it is extensible so
that other platforms can be supported in the future. In the remainder, we present the main
functionality, its query language, and the possibility to extend it. We conclude by outlining case
studies and a small demonstration for a Hyperledger DApp.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Main Functionality of BLF</title>
      <p>BLF is written in Java and its source code is publicly available1. It consists of three parts: BcQL,
the validator, and the extractor. In the following, we briefly present how BcQL can be applied to
specify manifest files and thus ETL pipelines for DApps. After that, we summarize the extractor
and validator functionality which builds upon manifest files (see Fig. 1). Lastly, we elaborate on
possibilities to extend the framework.</p>
      <p>Manifest and BcQL. A manifest file defines how logs are generated from DApp data. This
includes details regarding which blockchain to connect to, what data to extract, how to structure
the data, and where to store it in which format. To this end, BcQL is designed as a declarative</p>
      <sec id="sec-2-1">
        <title>1https://github.com/TU-ADSP/Blockchain-Logging-Framework</title>
        <p>query language that abstracts away low-level extraction details like data decoding, composition
of API calls, etc., so that developers can focus on defining the actual ETL process.</p>
        <p>In Fig. 2, the structure of a manifest document is shown with its mandatory and optional
elements. In each manifest the user first has to define the blockchain context (e.g. “ETHEREUM”,
“HYPERLEDGER”), a connection to a blockchain node, and an output folder. Additionally,
BcQL gives the option to specify (1) filters (e.g., block filter, transaction filter), which allow
users to narrow down the DApp data to be extracted, (2) expression statements which provide
transformation and logic operators that can be used to process the data, and (3) emit statements
for formatting data in a specific target format. Additionally, the user may configure the emission
mode and the error handling strategy. By default, emission of the output files is done as
soon as all data of a blockchain application has been extracted, transformed, and loaded. Safe
batching and streaming allow to emit data for each processed block whereby the former option
continuously updates the main output files and the latter produces new files for each block.
Regarding error handling, BLF is capable of handling runtime errors in two distinct ways: errors
are either ignored, or they lead to the abortion of the current execution. Either way, the errors
will get printed to the console and written to an error log file. A detailed step-by-step tutorial
on how to write a manifest is provided on the project website2.</p>
        <p>
          Validator and Extractor. The main components of BLF are the validator and the extractor
which process user-defined manifest files (see Fig. 1). Here, we briefly summarize these two
components. The interested reader can find more information on these functions in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The validator supports the user and checks the manifest for specification errors. In a first
step, the parser generator library ANTLR4 parses a textual manifest file based on a specification
of BcQL’s grammar. In this regard, ANTLR4 generates an intermediate representation of the
manifest and identifies syntactic errors. Semantic analysis of the intermediate representation
is implemented as a set of custom rules that, e.g., check if filters are correctly nested and that</p>
      </sec>
      <sec id="sec-2-2">
        <title>2https://github.com/TU-ADSP/Blockchain-Logging-Framework/wiki/Manifest</title>
        <p>variable, parameter, and literal types are compatible.</p>
        <p>
          With a validated manifest, the extractor is able to extract, transform, and load data from
DApps. The framework extracts data block by block in their historical order, i.e., how they were
created and included in a blockchain. During the extraction, the specified filters are considered.
For transformation BLF provides a basic set of operators and additionally allows users to
integrate custom operators at compile time of the Ethereum Logging Framework. Finally, data
can be formatted and exported as textual application logs, or in the comma-separated values
(CSV) and the eXtensible Event Stream (XES) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] format.
        </p>
        <p>Extending BLF. The interaction between BLF and a blockchain node is done through a
standardized interface, called BaseBlockchainListener (see Fig. 3). By implementing this interface
for specific blockchain platforms, e.g., for Corda R3 or EOS, developers can add support for
additional platforms. Currently, BLF provides standard implementations for Hyperledger and
Ethereum. Besides functionality to extract data from blocks, transactions, log entries, and the
blockchain state, developers must declare the default variables that these entities have. For
example, on Ethereum blocks are identified by a block number, i.e., the position of the block in
the blockchain, while transactions and log entries are identified by indices that encode their
position within a block. Developers can follow the step-by-step tutorial3 to integrate further
platforms.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Demonstration and Maturity</title>
      <p>
        ELF, the predecessor of BLF, was already used in several case studies, amongst others to examine
the popular Ethereum game CryptoKitties [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and the Ethereum DApp Augur [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a popular
prediction and betting market. These case studies demonstrate real-world applications of the
framework and possibilities how it can be used. After reworking ELF to the extensible BLF, we
wrote a BcQL manifest again for Augur, ran BLF and successfully demonstrated BLFs continuous
capability to extract event logs from Ethereum DApps.
      </p>
      <p>The availability of Hyperledger use cases from production environments is low, because
Hyperledger is used as a private permissioned blockchain and has no public blockchain system.
3https://github.com/TU-ADSP/Blockchain-Logging-Framework/wiki/Adding-a-new-Blockchain-to-the-BLF
Thus, we implemented our own DApp on a Hyperledger node: HyperKitties4, a Hyperledger
reimplementation of the CryptoKitties Ethereum smart contract. By porting CryptoKitties to
Hyperledger we wanted to test whether the event log generation from Hyperledger results in
reasonable event logs as observed in previous case studies. We used HyperKitties to write events
into our private Hyperledger blockchain. We then created a manifest file that lets BLF connect
to the local Hyperledger node, extract the HyperKitties events from the blocks, and generate an
event log. We opened it in Disco where we could validate a reasonable event log similar to the
Ethereum result. Examples on how to write a manifest and a screencast are provided on the
main project website (see Footnote 1).</p>
      <p>This demo presented a framework to log data of blockchain-based applications. It provides
functions to extract, transform, and load data. The framework additionally supports diferent
modes for data emission and exception handling. It has been already applied in larger case
studies and ofers BPM researchers a new source of data for process mining. Albeit being usable
on production systems, BLF’s maturity should still be classified as a fully functional research
prototype. In future, we want to extend it to other blockchain technologies and improve the
usability of BcQL’s grammar.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>X.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Staples</surname>
          </string-name>
          ,
          <article-title>Architecture for blockchain applications</article-title>
          , Springer,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.</given-names>
            <surname>Van Der Aalst</surname>
          </string-name>
          , Process mining - Data science in action, Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Hobeck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Klinkmüller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H. D.</given-names>
            <surname>Bandara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Weber</surname>
          </string-name>
          , W. van der Aalst,
          <article-title>Process mining on blockchain data: A case study of augur</article-title>
          ,
          <source>in: BPM</source>
          <year>2021</year>
          , accepted, Springer,
          <year>2021</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Klinkmüller</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ponomarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Weber</surname>
          </string-name>
          , W. van der Aalst,
          <article-title>Mining blockchain processes: Extracting process mining data from blockchain applications</article-title>
          ,
          <source>in: BPM 2019: Blockchain Forum</source>
          , Springer,
          <year>2019</year>
          , pp.
          <fpage>71</fpage>
          -
          <lpage>86</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>C.</given-names>
            <surname>Klinkmüller</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Weber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Ponomarev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. B.</given-names>
            <surname>Tran</surname>
          </string-name>
          , W. van der Aalst,
          <article-title>Eficient logging for blockchain applications</article-title>
          , arXiv preprint arXiv:
          <year>2001</year>
          .
          <volume>10281</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>G.</given-names>
            <surname>Wood</surname>
          </string-name>
          , et al.,
          <article-title>Ethereum: A secure decentralised generalised transaction ledger</article-title>
          ,
          <source>Ethereum project yellow paper 151</source>
          (
          <year>2014</year>
          )
          <fpage>1</fpage>
          -
          <lpage>32</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>E.</given-names>
            <surname>Androulaki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bortnikov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Cachin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Christidis</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Caro</surname>
          </string-name>
          , et al.,
          <article-title>Hyperledger fabric: a distributed operating system for permissioned blockchains</article-title>
          , in: EuroSys conference,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>15</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>