<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decentralizing the Semantic Web through Incentivized Collaboration</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>IDLab, Dep. of Electronics and Information Systems, Ghent University - imec</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Personal data is being centralized at an unprecedented scale, and this comes with widely known and far-reaching consequences, considering the recent data scandals with companies such as Equifax and Facebook. Decentralizing personal data storage allows people to take back control of their data, and Semantic Web technologies can facilitate data integration at runtime. However, such data processing over decentralized data requires far more expensive algorithms, while at the same time, less processing power is available in individual stores compared to large-scale data centers. This article presents a vision in which nodes in decentralized networks are incentivized to collaborate on data processing using a distributed ledger. By leveraging the collective processing capacity of all nodes, we can provide a sustainable alternative to the current generation of centralized solutions, and thereby put people back in control without compromising on functionality.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>agenda
pictures
my personal
data storage
contacts
photo
gallery
social
feed
meeting
scheduler
applications
photo
editor
document
editor</p>
      <p>Ben’s
storage</p>
      <p>Alice’s
storage
Carol’s
storage</p>
      <p>
        Such a wide cross-application interoperability without strong prior agreements can
be achieved by encoding semantics along with data and queries, as is possible with
Semantic Web technologies like rdf and sparql. Data can be represented through
a choice of widely used and custom ontologies. Every person is free to pick their
ontologies and, because of semantics, reasoning can bridge ontological differences.
In other words, the decentralized aspects of Linked Data and the uncoordinated nature
of rdfs and owl ontologies are a good fit for such scenarios [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Performance problems of decentralization</title>
      <p>Compared to centralized systems, decentralized systems are facing a double disadvantage:
individual nodes are not only solving a harder problem, they are doing so with far
fewer resources. On the one hand, algorithms for decentralized data processing require
significantly more processing power and network bandwidth than their centralized
counterparts, because of heterogeneity and distribution. On the other hand, each individual
node in the network—be it a data store or a client running an application—possesses far
less computational power and bandwidth than large centralized data centers.</p>
      <p>Furthermore, many of our data processing algorithms are not prepared for the scale
of decentralization entailed by full data ownership. As a simple but realistic example,
building the social media feed of a person with 500 friends requires executing a query
over 500 different data sources in the worst case, where each of those friends store their
data at a different location. State-of-the-art federated sparql query engines consider
use cases of a dozen of large datasets with entirely different data shapes. In contrast,
decentralized data storage will require federated queries over hundreds of small datasets
with highly similar shapes. Current summarization and source selection strategies, crucial
to federated performance, are not designed to function under such conditions.</p>
      <p>
        Finally, exposing personal data storage through query endpoints comes with challenges
of its own. Federated sparql query engines are usually benchmarked in private networks.
On the public Web, sparql endpoints have long suffered from availability problems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ],
and regardless of whether the causes are technological or managerial, there is a
nonnegligible risk that such problems would manifest themselves with at least a part of
personal data stores. While less expressive query interfaces have shown promise on
public networks [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], as data becomes spread across an increasing number of nodes, we
can expect to run into severe bandwidth usage and associated query slowdowns.
      </p>
      <p>Decentralizing the Semantic Web through Incentivized Collaboration</p>
      <p>Leveraging strength in numbers through collaboration
Decentralized networks have a particular asset: even though individual nodes have limited
resources compared to large-scale server clusters, collectively, these nodes possess a far
larger amount of computational power and bandwidth. Every single personal data store,
as well as every client (computers, smartphones, tablets, . . . ), brings their own cpus—
which, in a centralized environment, are typically underused. If we find a way for these
nodes to collaborate, we solve the resources problem in decentralized networks. If we
take optimization measures, such as performing preparatory work on the nodes closest to
the data, we can counter the increased complexity of decentralized algorithms.</p>
      <p>
        Let us apply this insight to the data gathering phase of applications, which in
a decentralized network amounts to federated query evaluation. A straightforward query
to collect the recent activity of one’s contacts would involve the application sending
subqueries to each of those contacts’ data stores. However, social media networks typically
contain overlapping clusters of people, so any person on a contact list is likely to have
a subset of that list as contacts too. Therefore, we can set up agreements along the lines
of “I will help you execute your query if you help me execute mine”. Then instead of
sending subqueries to, for instance, 500 contact nodes, we can delegate larger subqueries
to 10 or 20 hubs in parallel. Instead of executing data gathering entirely at the server or
the client [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], we thus dynamically redistribute query execution across the network.
4
      </p>
      <p>
        Providing incentivization and trust through distributed ledgers
In order to reach sustainable collaborations, nodes need to be incentivized to act as
a contributor to the network. Otherwise, a node cannot be sure that, if it helps other nodes
while idle, the others will return the favor when needed. However, when incentives are
created, nodes also gain a reason for dishonest behavior, so we will need a trust mechanism
to verify whether the work was completed correctly. For lack of a centralized entity
in the network, such incentives and trust need to be established through decentralized
consensus. This is possible through distributed ledgers [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which can keep track of the
work performed and hence the right to receive help from others.
      </p>
      <p>
        One category of distributed ledgers are blockchains [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which require a proof in
order to add something to a ledger. Whereas the popular Bitcoin ledger is known for an
essentially meaningless computation as proof-of-work, newer types of ledgers such as
Filecoin [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] introduce more meaningful purposes for this proof. With Filecoin, people can
pay others to securely store and retrieve their data, and a proof-of-replication confirms that
the data is there at all times. We would similarly need to develop a proof-of-query-results
that captures both the work performed as well was the correctness of the results.
      </p>
      <p>Figure 2 shows the architectural components of an individual node in the network.
When a query arrives, the node determines what incentive it is willing to accept, and what
incentives it is wiling to pay others for subquery delegation. After possibly delegating
some parts, and performing the remaining work itself, it maintains provenance of the
data and generates a correctness proof of the results. Transactions are registered on
the blockchain, such that all participants receive their reward. Some nodes might start
performing preparatory work, such as precomputing partial results of common queries in
the network, or locally caching other stores’ data to speed up querying.
Provenance
&amp; Proof</p>
      <p>Query
Processor spelalencntiiongn
discovery
collaborative ecosystem
of clients and stores
blockchain
…
tasks
tasks
tasks
…</p>
    </sec>
    <sec id="sec-3">
      <title>Projected impact</title>
      <p>
        This idea goes beyond data marketplaces [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] by in essence proposing a service marketplace
between nodes in a decentralized semantic data network. While the example applies this
to query execution over personal data, other kinds of services can be auctioned as well,
such as reasoning to convert data to different ontologies. All such applications rely on
the principle that client cpus are idle most of the time, so by allowing others to use it
when we do not, we can rely on them at the moment we need it ourselves.
      </p>
      <p>This proposal can have a strong impact on the scale at which we apply Semantic Web
technologies, especially in absence of clear business models. It opens up new directions
in decentralized algorithms, and creates a connection between the Semantic Web and
agent theory, as well as economic models for incentives. We also must pay attention to
challenges such as privacy, perhaps through encryption. Most importantly, this vision
sketches a Web-oriented future path to a Semantic Web for large and small players.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Filecoin</surname>
          </string-name>
          :
          <article-title>A decentralized storage network</article-title>
          . Whitepaper, Protocol
          <string-name>
            <surname>Labs</surname>
          </string-name>
          (
          <year>Aug 2017</year>
          ), https: // filecoin.io/ filecoin.pdf
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Three challenges for the Web, according to its inventor</article-title>
          .
          <source>World Wide Web Foundation (Mar</source>
          <year>2017</year>
          ), https:// webfoundation.org/
          <year>2017</year>
          / 03/ web-turns-28-letter/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Buil-Aranda</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vandenbussche</surname>
          </string-name>
          , P.Y.:
          <article-title>sparql Web-querying infrastructure: Ready for action?</article-title>
          <source>In: Proc. of the 12th Int. Semantic Web Conference</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Grubenmann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Aglio</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moor</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seuken</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Decentralizing the Semantic Web: who will pay to realize it?</article-title>
          <source>In: Proceedings of the Workshop on Decentralizing the Semantic Web</source>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Mansour</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sambra</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hawke</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zereba</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Capadisli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghanem</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aboulnaga</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>A demonstration of the Solid platform for social Web applications</article-title>
          .
          <source>In: Companion Proceedings of the 25th International Conference on World Wide Web</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Nakamoto</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Bitcoin: A peer-to-peer electronic cash system</article-title>
          , https:// bitcoin.org/ bitcoin.pdf
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Van Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>De Vocht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>De Meester</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Haesendonck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.</surname>
          </string-name>
          :
          <article-title>Triple Pattern Fragments: a low-cost knowledge graph interface for the Web</article-title>
          .
          <source>Journal of Web Semantics 37-38 (Mar</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>