<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decentralizing the Semantic Web: Who will pay to realize it?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tobias Grubenmann</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Daniele Dell'Aglio</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abraham Bernstein</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Dmitry Moor</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sven Seuken</string-name>
          <email>seukeng@ifi.uzh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, University of Zurich</institution>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Fueled by enthusiasm of volunteers, government subsidies, and open data legislation, the Web of Data (WoD) has enjoyed a phenomenal growth. Commercial data, however, has been stuck in proprietary silos, as the monetization strategy for sharing data in the WoD is unclear. This contrasts with the traditional web where advertisement fueled a lot of the growth. This raises the question how the WoD can (i) maintain its success when government subsidies disappear and (ii) convince commercial entities to share their wealth of data. In this paper, we propose a marketplace for decentralized data following basic WoD principles. Our approach allows a customer to buy data from di erent, decentralized providers in a transparent way. As such, our marketplace presents a rst step towards an economically viable WoD beyond subsidies.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The Web of Data (WoD) is a machine-readable alternative to the traditional
World Wide Web. In the WoD, data is exposed in a semantically annotated
format which allows machines to easily access the information they need according
to the task they are performing. Due to the ease of integration given by the
underlying Semantic Web technologies, data sources can be queried in a federated
fashion without agreeing on a common scheme beforehand. Hence, the WoD can
be seen as one big, decentralized database which can be queried over the Web.</p>
      <p>
        Without nancial incentives, many promising datasets will be poorly
maintained or be unavailable as relying on volunteers is not enough to keep the data
up-to-date and the endpoint running. Indeed, as [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] points out, only a third of
all known public endpoints have an uptime of 99% and above.
      </p>
      <p>
        From our opinion, one main reason is a lack in nancial incentives for people
to provide data in a semantic format. Unlike in the traditional Web,
semantic data is accessed primarily by automatic agents rather than human ones.
Therefore, advertisements are completely ignored while accessing the data. An
alternative to advertisement is to charge a fee for accessing the data. Such
strategies are already pursued in the traditional Web by companies like Bloomberg,
LexisNexis, and Thomson Reuters. Also, marketplaces like the Azure
DataMarketplace allow di erent publishers to sell data with di erent subscriptions. So far,
none of these implemented markets allow users to buy data in an integrated way
from decentralized data providers. Speci cally, it is not possible for a user to buy
data which constitutes of a join between di erent datasets from di erent sources.
Hence, applying the aforementioned subscription-based monetization strategies
to the WoD is not compatible with the idea of a decentralized Semantic Web.
How can we wean the WoD from government subsidies or federation-averse
centralization? Finding an answer to this question is crucial to ful ll the promise
of the data economy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Our vision is to create a marketplace where decentralized data providers can
o er their data and customers can buy answers to SPARQL queries. Such a
marketplace is one possible way to make the Semantic Web independent from
subsidies and make it nancially sustainable. Our marketplace di erentiates
itself from existing marketplaces by these two distinctive features:
1. Customers can buy commercial data in an integrated way { meaning that a
customer can buy data from multiple commercial datasets as if they would
be one big, commercial database.
2. Customers are buying only the data which is needed to form the respective
query answer instead of buying all data which is involved in query execution.
2</p>
    </sec>
    <sec id="sec-2">
      <title>A marketplace for Semantic Data</title>
      <p>We propose to build a marketplace which allows combining data from di erent
sources in an integrated way. Depending on the query and the providers involved,
there might be di erent combination of providers' data that yield non-empty
query answers. The market hence needs to (1) decide which datasets to include|
a process that is akin to source selection but needs to consider the prices for the
di erent results and (2) determine optimal payments to each of the providers
ensuring their participation in the marketplace.</p>
      <p>
        Most data in the Semantic Web is not located in a single endpoint but
distributed over several endpoints. Each endpoint can, potentially, contribute to
a given query answer. Searching for endpoints o ering the required data
becomes cumbersome if the number of endpoints increases. A customer may need
to access a lot of data out of which only a few (if any at all) end up in the
result. Given these problems, we argue for a marketplace which is able to
assess individual endpoints on their usefulness for a given query and which can
help the customer to decide which data should be bought. As we have shown in
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], deciding whether accessing a certain combination of endpoints would yield
a big enough result which is worth the involved costs is a challenging task. As
the WoD gets more decentralized, it becomes unlikely that it is possible to
accurately evaluate the contribution of a single endpoint towards a query answer
without actually executing the query. Join estimation techniques for SPARQL
queries might help to sort out endpoints which can hardly contribute towards a
query answer. However, for the remaining endpoints, only a query execution can
reveal the true contribution and value of an endpoint's data. Hence, we argue
that a market for Semantic Data in a decentralized setting has to execute a given
query on all promising endpoints before the decision can be made which part of
the data should be bought by the customer. The sellers trust in the marketplace
that it will not forward the data to the customer without payment.
      </p>
      <p>Once a query is executed on promising endpoints, the result can be rated
by the marketplace and either a buying decision can be made by the market
on behalf of the customer, or a summary of the ndings can be given to the
customer who can then make a buying decision. Only after the buying decision
has been made and the involved payments have been completed, the customer
will receive the actual data.</p>
      <p>Besides the buying decision, the market has to determine (1) how much
a customer has to pay for the query answer and (2) how much payment each
provider's contribution to a query answer warrants.</p>
      <p>Figure 1 shows all the steps in our marketplace:
1. The market receives a query from the customer and executes it on the
available sources.
2. Only a certain number of solutions from the original query answer are
selected to compose the nal query answer the customer will receive. Either
the market does this buying decision on behalf of the customer or the
customer decides based on some statistics which solutions to include into the
nal query answer.
3. The customer pays the marketplace the indicated price and receives the
query answer.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Costs of a Query Answer</title>
      <p>To discuss the costs involved in producing a query answer, we distinguish
between two di erent roles on the sellers' side: Provider and Host.</p>
      <p>
        A provider is the originator of data, which is used in the production of a
query answer. Providers are responsible for the quality of data, including
recentness, consistency and accuracy [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Providers do not serve their data; this is done
by separate entities, the hosts. Hosts operate computers that run SPARQL
endpoints for querying data products. They provide the computational and network
resources needed to query the providers' data products. Hence, they ensure the
reliability, availability, security, and performance, which are usually speci ed as
Quality of Service [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The separation between host and provider enables more exible business
models for data provision, as some providers might have an initial budget to
create data (e.g., government subsidies) but do not have the funds to cover the
operating costs for running a SPARQL endpoint or may have other reasons to
outsource the actual data provision. Providers can decide to act at the same
time as a host for their own and/or other provider's data. Nevertheless, we will
distinguish between these two di erent roles and treat them as separate entities.</p>
      <p>Data providers might have large xed costs, which typically accrue whilst
creating the data. The marginal costs of o ering data, however, is (e ectively)
1.
2.
3.
Fig. 1. The three steps from a query to the
zero for the provider. This is because, as discussed above, the data is not served
by providers but by hosts. Any cost that might occur while o ering data is
in icted on the host. It is important to note that even if a provider acts as its
own host, the marginal costs are only in icted on the entity acting as a host,
not as a provider.</p>
      <p>Like cloud service providers, hosts incur the xed cost of operating the
infrastructure, possibly some variable cost relative in the size of the data they
store, and some marginal cost in form of the computational resources spent for
each executed query. The host's marginal costs occur whenever the providers'
data are queried, independently of whether any data will eventually be bought
by a customer.</p>
      <p>Data providers rely on the hosts to make their data available to the
marketplace and thus, enable customers to buy their data. Similar to a Web host for
traditional Web content, hosts in our market concept are paid by the provider,
based on some service agreement. Hence, the providers have to include the
hosting costs into their pricing decision. The hosts' costs are already compensated
by the providers prior to query execution by the market. Thus, the hosts' costs
become transparent to the market and, as a result, the market and customer do
not have to take them into account. This facilitates the buying decision.</p>
      <p>Figure 2 shows how the payments are distributed from the marketplace to
the providers. The providers pay the hosts depending on some service agreement.
Note that the payment from provider to the host can be independent from the
payment from the marketplace to the providers. Also note that only provider
which can actually contribute to the nal query answers are getting payed for
their services.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Outlook</title>
      <p>To continue growing and being able to serve as a high-quality, decentralized
data source, the WoD has to nd the means to fund the creation, serving, and
maintenance of data sources. In this paper, we proposed a new vision for funding
these activities in the form of a marketplace for Semantic Data.</p>
      <p>
        As a precursor to our research, we conducted a pilot study simulating a
market platform for the WoD [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], we introduced the idea of using a
double-auction for the WoD and showed the de ciency of the threshold rule in
this setting, together with three ways to correct them. In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] we studied payment
rules which are contingent on future realizations of join-estimates and can be
used in the combinatorial data auctions. However, these approaches assumed
that we have access to accurate join-estimates to produce satisfying results { an
assumption which might be hard to enforce in the WoD.
      </p>
      <p>Based on our research, we foresee the following challenges in building a
market place for Semantic Data:
{ Di erent market mechanisms have to be explored to understand their
tradeo s under various market settings.</p>
      <sec id="sec-4-1">
        <title>Host</title>
        <p>$</p>
      </sec>
      <sec id="sec-4-2">
        <title>Host</title>
        <p>$
Data
Data
Data
Data
Provides
Provides
Provides
$
Provides</p>
      </sec>
      <sec id="sec-4-3">
        <title>Provider</title>
      </sec>
      <sec id="sec-4-4">
        <title>Provider</title>
      </sec>
      <sec id="sec-4-5">
        <title>Provider</title>
        <p>Fig. 2. The customer pays the market which redirects the payment to the providers.
The providers pay the hosts for their services.</p>
        <p>{ Given a market mechanism, providers of Semantic Data have to decide how
they will bundle and price the data they are selling. The challenge is to nd
prices which will satisfy the customers and allow the providers to cover their
costs.
{ Our market idea introduces a new metric for source selection, query
optimization, and query execution: the nancial pro tability. Revisiting known
techniques and developing new techniques with respect to this new metric
will undoubtedly open interesting opportunities for research.
{ A customer might not know enough about the structure of the o ered data to
compose a query in the rst place. Data providers might be required to o er
some representative sample data for free or allowing to execute explorative
queries (with possible limitations) for free on their datasets.</p>
        <p>This paper is a rst step in the direction of nding stable nancing for the
WoD. We plan to address the aforementioned challenges in future work and
believe that our vision of a marketplace for Semantic Data is a promising way to
ensure the nancial sustainability of decentralized providers of Semantic Data.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the Swiss National Science Foundation
under grant #153598.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <article-title>Fuel of the future: Data is giving rise to a new economy</article-title>
          .
          <source>The Economist</source>
          ,
          <source>2017(May 6th)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Buil-Aranda</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          , and P.-Y. Vandenbussche.
          <article-title>SPARQL webquerying infrastructure: Ready for action? In A. H</article-title>
          . et al., editor,
          <source>The Semantic Web { ISWC</source>
          <year>2013</year>
          ., volume
          <volume>8219</volume>
          , pages
          <fpage>227</fpage>
          {
          <fpage>293</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>S.</given-names>
            <surname>Dustdar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pichler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Savenkov</surname>
          </string-name>
          , and
          <string-name>
            <given-names>H.-L.</given-names>
            <surname>Truong</surname>
          </string-name>
          .
          <article-title>Quality-aware serviceoriented data integration: Requirements, state of the art and open challenges</article-title>
          .
          <source>In ACM SIGMOD Record</source>
          , volume
          <volume>41</volume>
          , pages
          <fpage>11</fpage>
          {
          <fpage>19</fpage>
          . ACM New York, NY, USA,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>T.</given-names>
            <surname>Grubenmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Moor</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Seuken</surname>
          </string-name>
          .
          <article-title>Challenges of source selection in the WoD</article-title>
          .
          <source>In Proceedings of the International Semantic Web Conference ISWC '17</source>
          ,
          <string-name>
            <surname>Forthcoming</surname>
          </string-name>
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>D.</given-names>
            <surname>Moor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grubenmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seuken</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>A double auction for querying the web of data</article-title>
          .
          <source>In The Third Conference on Auctions, Market Mechanisms and Their Applications</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Moor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Seuken</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Grubenmann</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Core-selecting payment rules for combinatorial auctions with uncertain availability of goods</article-title>
          .
          <source>In Twenty-Fifth International Joint Conference on Arti cial Intelligence</source>
          , pages
          <fpage>424</fpage>
          {
          <fpage>432</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Zollinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Basca</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Market-based sparql brokerage with matrix: Towards a mechanism for economic welfare growth and incentives for free data provision in the web of data</article-title>
          .
          <source>Technical Report IFI-2013.4</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>