<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Paths towards the Sustainable Consumption of Semantic Data on the Web</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aidan Hogan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Claudio Gutierrez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, Universidad de Chile</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Based on recent results, we argue that the right method for Web clients to access relevant information from Linked Datasets has not yet been found. We propose that something is needed between (i) Linked Data dereferencing, which is simple and reliable but too vaguely de ned; (ii) data dumps, which are simple and reliable but too coarse-grained, and (iii) SPARQL querying, which is powerful and ne-grained but too unreliable. We argue that new protocols and query languages need to be investigated and de ne eight desiderata that an access method should meet in order to be considered sustainable for a mature Web of Data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        ( 225 ) triples for the BTC '11 crawl due to accessibility issues [4, §4.3]. In other
work, we showed that publishers often provide incomplete, ad hoc information in
dereferenced documents: e.g., we found that on average, publishers include 83.6%
of local triples where a URI appears as subject in its resp. dereferenced document,
but only 55.2% of local triples where it appears as object, and that only 32.8%
of local dereferenceable resources are assigned a human readable label [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Downloading the data-dump is coarse and involves accessing a lot of irrelevant
data. The problem can be mitigated by compression: in previous works, we proposed
HDT, which can compress data-dumps by a factor of 15 while o ering lookup
functionality over the archive, thus tackling problems with bandwidth and helping
clients to extract relevant data o ine [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. However, the full dataset still needs to be
transferred, which is wasteful for small requests, and updates are di cult to mirror.
      </p>
      <p>
        For SPARQL endpoints, evaluating even a SPARQL 1.0 query is
PSpacecomplete [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and query-planning costs become less reliable as queries grow more
complex. Relatedly, we previously showed that many public SPARQL endpoints
su er from various issues, harming their usability: for example, we found that of
the 427 public endpoints surveyed, only 32% had an availability (i.e., \uptime")
falling into 99{100% and that only 13.3% could return more than 100,000 results
upon request (due to the popular use of result-size thresholds) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>Each access method has, in practice, exhibited issues that undermine its
sustainability; furthermore, these three access methods cannot be readily combined.
Desiderata: To help de ne a path forward, we propose a list of eight desiderata
for sustainable data-access methods, divided into four di erent goals, as follows:
Standardised: The rst two criteria refer to the agreement that exists between
client(s) and server(s):
1. Accessible: a software agent can access data through a uniform protocol
without location-speci c logic. This holds for SPARQL and for dereferencing, but not
for dumps (which vary in formats, compression, access, etc.).
2. Well-defined: given a query Q and a dataset D, both client and server can
precisely agree on what the response R(Q; D) should be. This holds for SPARQL,
but not for dumps (which may vary in their completeness) or dereferencing
(where dereferenced content varies from server to server).</p>
      <p>Bandwidth conservation: The second two criteria aim to minimise wasting
bandwidth in transferring irrelevant data to a client:
3. Granular: the query language allows the client to specify su cient
information in Q to avoid transferring irrelevant data. This is true for SPARQL, but
not for dereferencing (e.g., to get the capitals of countries, full descriptions for
each country must be dereferenced) or dumps.
4. Pagination: a large response R(Q; D) can be served in chunks until the client
is satis ed. This is not true for dereferencing or dumps and is costly in SPARQL
(which relies on ORDER BY or vendor-speci c heuristics).</p>
      <p>Server e ciency: The third two criteria aim to make the access method
sustainable for the server to host:
5. Cacheable: common requests are amenable to caching techniques/answerable
by direct lookup; previously computed responses can be easily re-used. This is
true for dereferencing and dumps but is prohibitive for SPARQL.</p>
      <p>Dereferencing</p>
      <p>Dumps
SPARQL endpoints</p>
      <p>X
X
e
l
b
i
s
s
e
c
c
A
X</p>
      <p>X
X
X</p>
      <p>X
X</p>
      <p>X</p>
      <p>X
X
6. Costable: the server can e ciently and accurately predict the
processing/transport cost of serving R(Q; D), fostering quality of service. This is true for
dereferencing and dumps, but not for general SPARQL queries.</p>
      <p>Client usability: The nal two criteria refer to the needs of clients:
7. Transparent: the client can determine if a dataset D is relevant for their needs
and if a service is su ciently reliable to serve a given purpose. This is (arguably)
true for dereferencing since a client knows the topic of a page; however, a client
will not know a priori what content (or quality of service) is provided behind
a SPARQL endpoint or a dump.
8. Robust: the access method can gracefully handle any type of valid request
from multitudinous clients; if exceptions occur, they are clearly identi ed and
accounted for by quality of service. This is true for dereferencing and dumps but
not for SPARQL (which may, e.g., fail or silently return a partial response).</p>
      <p>
        Table 1 summarises these desiderata for the three state-of-the-art Linked Data
access methods: clearly the right combination of protocol and query language has
not yet been found. New proposals for access methods that satisfy more of these
desiderata { perhaps like \Linked Data Fragments" [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] { need to be investigated in
order to meet the expectations of future applications and increased tra c.
Otherwise, if we stick with current trends, the Web of Data may continue to stagnate in
its current local maximum: a perpetual experimental phase; a nice idea.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>C. B. Aranda</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Umbrich</surname>
            , and
            <given-names>P.-Y.</given-names>
          </string-name>
          <string-name>
            <surname>Vandenbussche. SPARQL WebQuerying</surname>
          </string-name>
          <article-title>Infrastructure: Ready for Action? In ISWC</article-title>
          , pages
          <volume>277</volume>
          {
          <fpage>293</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>J.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Mart nez-</article-title>
          <string-name>
            <surname>Prieto</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Polleres</surname>
            , and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arias</surname>
          </string-name>
          .
          <article-title>Binary RDF representation for publication and exchange (HDT)</article-title>
          .
          <source>JWS</source>
          ,
          <volume>19</volume>
          :
          <fpage>22</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Cyganiak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Decker</surname>
          </string-name>
          .
          <article-title>An empirical survey of Linked Data conformance</article-title>
          .
          <source>JWS</source>
          ,
          <volume>14</volume>
          :
          <fpage>14</fpage>
          {
          <fpage>44</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. T. Kafer, J.
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hogan</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>Towards a Dynamic Linked Data Observatory</article-title>
          .
          <source>In LDOW</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Arenas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and complexity of SPARQL</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>34</volume>
          (
          <issue>3</issue>
          ),
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Karnstedt</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Parreira</surname>
          </string-name>
          .
          <article-title>Eight fallacies when querying the Web of Data</article-title>
          .
          <source>In DESWEB</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. Vander</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Coppens</surname>
          </string-name>
          , E. Mannens, and R. Van de Walle.
          <article-title>Web-scale querying through Linked Data Fragments</article-title>
          .
          <source>In LDOW</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>