<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Demonstration of ColChain: Collaborative Knowledge Chains?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Christian Aebeloe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Gabriela Montoy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalborg University</institution>
          ,
          <addr-line>Aalborg</addr-line>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The current architecture of the Semantic Web fully relies on the individual data providers to maintain access to their data and to keep their data up-to-date. While this may seem like a practical and straightforward solution, it often results in the data being unavailable or outdated. In this demo paper, we present a fully functioning client along with a user-friendly interface for ColChain, a system that increases availability of knowledge graphs and enables users to update the data in a community-driven way while still allowing them to query old versions.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        In recent years, the continuous advances of Semantic Web technologies have
led to a rapid increase in the amount of data published as Linked Open Data.
Naturally, the published information is subject to change and evolution [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]; errors
are corrected, major updates are released, etc. However, the proliferation of data
on the Web of Data and the fact that we currently rely on the data providers
to maintain access to their datasets and keep them up-to-date represents a
signi cant burden on the data providers [
        <xref ref-type="bibr" rid="ref1 ref9">1, 9</xref>
        ]. As a result, SPARQL endpoints
often experience downtime [
        <xref ref-type="bibr" rid="ref4 ref8">4, 8</xref>
        ] and available data is sometimes outdated [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
SELECT ? pr2 WHERE f
dbr : P r e s i d e n t o f t h e U n i t e d S t a t e s dbo : incumbent ? pr1 .
? pr1 dbo : party ? pa .
? pr2 dct : s u b j e c t dbr : Category : P r e s i d e n t s o f t h e U n i t e d S t a t e s .
? pr2 dbo : party ? pa
g
Listing 1: SPARQL query Q that nds former U.S. presidents of the same party
as the current (incumbent) U.S. president.
      </p>
      <p>Consider, for instance, query Q in Listing 1. Q nds all former U.S. presidents
that have been a member of the same party as the current (incumbent) U.S.
president. However, as of the writing of this paper, processing Q over the latest
DBpedia release (version 2021-01)1 results in ?pr1, i.e., the current (incumbent)
? Copyright ' 2021 for this paper by its authors. Use permitted under Creative</p>
      <p>Commons License Attribution 4.0 International (CC BY 4.0).
1 https://www.dbpedia.org/resources/latest-core/
president, being bound to dbr:Donald Trump although the inauguration of
President Biden took place months ago. While this is likely to be changed in
the next release, the delay in the update shows that information available on the
Semantic Web is not always up-to-date.</p>
      <p>
        In this paper, we demonstrate a fully functioning client along with a
userfriendly interface for ColChain (COLlaborative knowledge CHAINs) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], a
system that builds on unstructured Peer-to-Peer (P2P) networks [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and uses
replication of data fragments across several nodes to maintain high availability.
Furthermore, ColChain enables users to collaboratively update datasets while
also allowing users to process queries over previous versions. ColChain divides
the P2P network into smaller communities of nodes that collaborate on keeping
certain data up-to-date relying on community-wide consensus. Updates in
ColChain are stored in blockchain-like chains; when a consensus for an update
is reached, it is applied to the end of the chain. This allows any user to
propose updates to any dataset while making malicious updates less likely.
Furthermore, the update chains allow users to access previous versions of the
datasets. While [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] presents the theoretical framework, this demo paper presents
a working ColChain implementation with a user-friendly interface2.
      </p>
      <p>This paper is structured as follows. In Section 2, we present an architectural
overview of ColChain while in Section 3 we describe the demonstration that
will be conducted at the conference.
2</p>
    </sec>
    <sec id="sec-2">
      <title>System Overview</title>
      <p>
        Figure 1a shows an example of a ColChain network that consists of three nodes
that store data from two communities, where the nodes either participate in or
observe the communities. Participating nodes share a set of data fragments with
the community they participate in and collaborate on keeping those fragments
up-to-date. In Figure 1a, since node A participates in community C1, it stores
C1's fragments in its local datastore. Furthermore, A observes community C2,
and thus only indexes C2's fragments, relying on asking either node B or C (i.e.,
participants) to access C2's data. Due to space restrictions, we do not go into
details with aspects, such as how to create and maintain communities, but refer
the interested reader to [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] for a more technically detailed description.
      </p>
      <p>
        ColChain relies on the consensus of participating nodes within a community
to apply updates collaboratively. In its current form, users of at least half
the participants in a community have to actively accept the update using our
interface. While this means applying a proposed update might entail an overhead
on the validation, this is acceptable to let users ensure that updates are factual
and non-malicious. Furthermore, as described in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], our future work includes
providing consensus protocols that make the active participation of users more
scalable, e.g., by detecting malicious updates automatically or letting fragment
owners specify a quali ed majority.
2 The client and source code are available at https://relweb.cs.aau.dk/colchain/
participant
Datastore
      </p>
      <p>Index
Node A</p>
      <p>C1</p>
      <p>C2
Datastore</p>
      <p>Index</p>
      <p>observer
Datastore</p>
      <p>Index</p>
      <p>Node C</p>
      <p>Node B
(a) Example ColChain network.</p>
      <p>Community</p>
      <p>Manager</p>
      <sec id="sec-2-1">
        <title>Processing Layer</title>
      </sec>
      <sec id="sec-2-2">
        <title>Communication Layer</title>
        <p>Web Interface Node Interface
Community
Management</p>
        <p>Transaction
Consensus</p>
        <p>SQPuAerRieQsL TRreiqpuleesPtsattern
SPARQL Query
Processor</p>
        <p>Triple Pattern Mappings
Updates
nil
nil
nil</p>
        <p>Data Storage Layer
(b) Architecture of a ColChain node.
A ColChain node generally consists of several architectural layers as illustrated
in Figure 1b These layers are as follows.</p>
        <p>Communication Layer. The communication layer exposes two components:
the Web interface and the node interface. The Web interface provides a GUI
that allows users to interact with the system, e.g., to issue SPARQL queries (on
current or previous versions of the data), propose updates, and decide whether
to accept or reject updates proposed by other users. The node interface accepts
messages from other nodes, e.g., when another participant accepts an update.</p>
        <p>Processing Layer. The processing layer consists of two components: the
community manager and the query processor. The community manager validates
updates, and manages chains and fragments as well as community memberships.
The query processor is able to process SPARQL queries over current and previous
dataset versions available at any user-speci ed point in time.</p>
        <p>
          Data Storage Layer. The data storage layer contains the node's local data
store. ColChain nodes use HDT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] as backend for storing data fragments.
Changes to fragments are applied to the data storage layer by the community
manager and appended to the chain for the given fragment. The data storage
layer is used to process triple pattern requests by the SPARQL query processor.
Furthermore, it can roll back fragments to earlier versions to allow users to
process queries over those versions.
2.2
        </p>
        <sec id="sec-2-2-1">
          <title>Graphical User Interface for ColChain</title>
          <p>Consider again query Q from Listing 1 and a user who wants to suggest
an update to obtain the expected result. Figure 2 shows how the user
interacts with ColChain to propose an update over the corresponding
fragment with dbo:incumbent as the identi er. The user searches for the URI
dbr:President of the United States (Figure 2a) and nds the triple with
dbr:Donald Trump as object (Figure 2b), which they then remove. The user
then adds the triple with dbr:Joe Biden as object to the fragment (Figure 2c).
Figure 2d shows the changes made by the user. Once the user saves the update
(Figure 2e), it is forwarded to the other participants in the community, which
are noti ed (Figure 2f). Once a majority accepts the update, it is applied across
the community, and the updated index is sent to the observers.</p>
        </sec>
        <sec id="sec-2-2-2">
          <title>2.3 Implementation Details</title>
          <p>
            ColChain is implemented in Java 8. The Web interface and node interfaces
(Figure 1b) are implemented as Java 8 servlets using Jetty3. We implemented
the query processor as an extension of Apache Jena4, thus it can process any
SPARQL query that Jena can process (e.g., queries with UNION or OPTIONAL).
As previously mentioned, ColChain nodes use HDT [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] as backend for storing
data fragments, as well as Pre x-Partitioned Bloom Filter (PPBF) indexes [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]
to index the fragments available locally or remotely. The chains of updates are
stored in persistent storage separately from the fragments. However, if possible,
ColChain also stores the update chains temporarily in main memory.
3 https://www.eclipse.org/jetty/
4 https://jena.apache.org/
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Demonstration</title>
      <p>
        At the conference, we will demonstrate ColChain using two scenarios that
attendees can explore and interact with. We will run a network with the data
from LargeRDFBench [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which comprises 13 interlinked datasets with over
a billion triples in total. Furthermore, we will run a separate network with
data from a subset of DBpedia that includes update chains back to version
2015-04, i.e., attendees will have the opportunity to explore query answers over
di erent versions of DBpedia. ColChain will be showcased using networks with
varying numbers of nodes and community sizes that follow di erent distributions
(e.g., Zip an as in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]). A video demonstration of ColChain using the DBpedia
scenario is available on our website5.
      </p>
      <p>To ease interaction with the system, we will provide several interesting
SPARQL queries for attendees to explore each scenario. Attendees will be invited
to build upon these queries, formulate queries on their own, explore query
answers over di erent versions, and propose updates. For instance, attendees
could propose the update shown in Figure 2. Query Q from Listing 1 could then
be processed over the updated data as well as over DBpedia version 2015-04
when ?pr1 would be bound to dbr:Barack Obama.</p>
      <p>Acknowledgments. This research was partially funded by the Danish Council
for Independent Research (DFF) under grant agreement no. DFF-8048-00051B,
Aalborg University's Talent Programme, and the Poul Due Jensen Foundation.
5 https://relweb.cs.aau.dk/colchain#demonstration</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aebeloe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoya</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>A Decentralized Architecture for Sharing and Querying Semantic Data</article-title>
          .
          <source>In: ESWC 2019</source>
          . pp.
          <volume>3</volume>
          {
          <issue>18</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Aebeloe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoya</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Decentralized Indexing over a Network of RDF Peers</article-title>
          .
          <source>In: ISWC 2019</source>
          . pp.
          <volume>3</volume>
          {
          <issue>20</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Aebeloe</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montoya</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>ColChain: Collaborative Linked Data Networks</article-title>
          .
          <source>In: WWW 2021</source>
          . pp.
          <volume>1385</volume>
          {
          <issue>1396</issue>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>C.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandenbussche</surname>
            ,
            <given-names>P.: SPARQL</given-names>
          </string-name>
          <string-name>
            <surname>Web-Querying</surname>
            <given-names>Infrastructure</given-names>
          </string-name>
          : Ready for Action? In:
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2013</year>
          . pp.
          <volume>277</volume>
          {
          <issue>293</issue>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Fernandez</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            nez-Prieto,
            <given-names>M.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arias</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Binary RDF representation for publication and exchange (HDT)</article-title>
          .
          <source>J. Web Semant</source>
          .
          <volume>19</volume>
          ,
          <issue>22</issue>
          {
          <fpage>41</fpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Pelgrin</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Galarraga</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Towards Fully- edged Archiving for RDF Datasets</article-title>
          . Semantic
          <string-name>
            <surname>Web</surname>
          </string-name>
          (
          <year>2021</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Saleem</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hasnain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.N.:</given-names>
          </string-name>
          <article-title>LargeRDFBench: A billion triples benchmark for SPARQL endpoint federation</article-title>
          . vol.
          <volume>48</volume>
          , pp.
          <volume>85</volume>
          {
          <issue>125</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Vandenbussche</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matteis</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aranda</surname>
            ,
            <given-names>C.B.</given-names>
          </string-name>
          :
          <article-title>SPARQLES: monitoring public SPARQL endpoints</article-title>
          .
          <source>Semantic Web</source>
          <volume>8</volume>
          (
          <issue>6</issue>
          ),
          <volume>1049</volume>
          {
          <fpage>1065</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sande</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herwegen</surname>
            ,
            <given-names>J.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vocht</surname>
            ,
            <given-names>L.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meester</surname>
            ,
            <given-names>B.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haesendonck</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Colpaert</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Triple Pattern Fragments: A low-cost knowledge graph interface for the Web</article-title>
          .
          <source>J. Web Sem</source>
          .
          <fpage>37</fpage>
          -
          <issue>38</issue>
          ,
          <issue>184</issue>
          {
          <fpage>206</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>