=Paper= {{Paper |id=None |storemode=property |title=Low-Cost Queryable Linked Data through Triple Pattern Fragments |pdfUrl=https://ceur-ws.org/Vol-1272/paper_10.pdf |volume=Vol-1272 |dblpUrl=https://dblp.org/rec/conf/semweb/VerborghHMHVSCCMW14a }} ==Low-Cost Queryable Linked Data through Triple Pattern Fragments== https://ceur-ws.org/Vol-1272/paper_10.pdf
                Low-Cost Queryable Linked Data
                through Triple Pattern Fragments

  Ruben Verborgh1 , Olaf Hartig2 , Ben De Meester1 , Gerald Haesendonck1 ,
Laurens De Vocht1 , Miel Vander Sande1 , Richard Cyganiak3 , Pieter Colpaert1 ,
                   Erik Mannens1 , and Rik Van de Walle1
                        1
                            Ghent University – iMinds, Belgium
                                {firstname.lastname}@ugent.be
                            2
                                University of Waterloo, Canada
                                    ohartig@uwaterloo.ca
            3
                Digital Enterprise Research Institute, nui Galway, Ireland
                                     richard@cyganiak.de




       Abstract. For publishers of Linked Open Data, providing queryable
       access to their dataset is costly. Those that offer a public sparql end-
       point often have to sacrifice high availability; others merely provide
       non-queryable means of access such as data dumps. We have developed
       a client-side query execution approach for which servers only need to
       provide a lightweight triple-pattern-based interface, enabling queryable
       access at low cost. This paper describes the implementation of a client
       that can evaluate sparql queries over such triple pattern fragments of
       a Linked Data dataset. Graph patterns of sparql queries can be solved
       efficiently by using metadata in server responses. The demonstration
       consists of sparql client for triple pattern fragments that can run as
       a standalone application, browser application, or library.

       Keywords: Linked Data, Linked Data Fragments, querying, availability,
       scalability, sparql


1    Introduction

An ever increasing amount of Linked Data is published on the Web, a large part
of which is freely and publicly available. The true value of these datasets becomes
apparent when users can execute arbitrary queries over them, to retrieve pre-
cisely those facts they are interested in. The sparql query language [3] allows to
specify highly precise selections, but it is very costly for servers to offer a public
sparql endpoint over a large dataset [6]. As a result, current public sparql
endpoints, often hosted by institutions that cannot afford an expensive server
setup, suffer from low availability rates [1]. An alternative for these institutions
is to provide their data in a non-queryable form, for instance, by allowing con-
sumers to download a data dump which they can use to set up their own private
sparql endpoint. However, this prohibits live querying of the data, and is in
turn rather expensive on the client side.
2                                Ruben Verborgh et al.

    In this demo, we will show a low-cost server interface that offers access to
a dataset through all of its triple patterns, together with a client that performs
efficient execution of complex queries through this interface. This enables pub-
lishers to provide Linked Data in a queryable way at low cost. The demo comple-
ments our paper at the iswc2014 Research Track [6], which profoundly explains
the principles behind the technology and experimentally verifies its scalability.
The present paper details the implementation and introduces the supporting
prototype implementation of our sparql client of triple pattern fragments.


2   Related Work

We contrast our approach with the three categories of current http interfaces
to rdf, each of which comes with its own trade-offs regarding performance,
bandwidth, and client/server processor usage and availability.

Public sparql endpoints The current de-facto way for providing queryable access
to triples on the Web is the sparql protocol, which is supported by many triple
stores such as Virtuoso, AllegroGraph, Sesame, and Jena tdb. Even though
current sparql interfaces offer high performance, individual queries can con-
sume a significant amount of server processor time and memory. Because each
client requests unique, highly specific queries, regular http caching is ineffective,
since this can only optimize repeated identical requests. These factors contribute
to the low availability of public sparql endpoints, which has been documented
extensively [1]. This makes providing reliable public sparql endpoints an excep-
tionally difficult challenge, incomparable to hosting regular public http servers.

Linked Data servers Perhaps the most well-known alternative interface to triples
is described by the Linked Data principles. The principles require servers to pub-
lish documents with triples about specific entities, which the client can access
through their entity-specific uri, a process which is called dereferencing. Each
of these Linked Data documents should contain data that mention uris of other
entities, which can be dereferenced in turn. Several Linked Data querying tech-
niques [4] use dereferencing to solve queries over the Web of Data. This process
happens client-side, so the availability of servers is not impacted. However, exe-
cution times are high, and many queries cannot be solved (efficiently) [6].

Other http interfaces for triples Additionally, several other http interfaces for
triples have been designed. Strictly speaking, the most trivial http interface
is a data dump, which is a single-file representation of a dataset. The Linked
Data Platform [5] is a read/write http interface for Linked Data, scheduled to
become a wc recommendation. It details several concepts that extend beyond
the Linked Data principles, such as containers and write access. However, the
api has been designed primarily for consistent read/write access to Linked Data
resources, not to enable reliable and/or efficient query execution. The interface
we will discuss next offers low-cost publishing and client-side querying.
                 Low-Cost Queryable Linked Data through Triple Pattern Fragments         3

3    Linked Data Fragments and Triple Pattern Fragments
Linked Data Fragments [6] enable a uniform view on all possible http interfaces
for triples, and allow to define new interfaces with different trade-offs.
Definition 1. A Linked Data Fragment (ldf) of a dataset is a resource con-
sisting of those triples of this dataset that match a specific selector, together with
their metadata and hypermedia controls to retrieve other Linked Data Fragments.
We define a specific type of ldfs that require minimal effort to generate by
a server, while still enabling efficient querying on the client side:
Definition 2. A triple pattern fragment is a Linked Data Fragment with
a triple pattern as selector, count metadata, and the controls to retrieve any other
triple pattern fragment of the dataset. Each page of a triple pattern fragment
contains a subset of the matching triples, together with all metadata and controls.
    Triple pattern fragments can be generated easily, as triple-pattern selection
is an indexed operation in the majority of triple stores. Furthermore, specialized
formats such as the compressed rdf hdt (Header – Dictionary – Triples [2])
natively support fast triple-pattern extraction. This ensures low-cost servers.
    Clients can then efficiently evaluate sparql queries over the remote dataset
because each page contains an estimate of the total number of matching triples.
This allows efficient asymmetric joins by first binding those triple patterns with
the lowest number of matches. For basic graph patterns (bgps), which are the
main building blocks of sparql queries, the algorithm works as follows:
 1. For each triple pattern tpi in the bgp B = {tp1 , . . . , tpn }, fetch the first
    page φi1 of the triple pattern fragment fi for tpi , which contains an es-
    timate cnti of the total number of matches for tpi . Choose  such that
    cnt = min({cnt1 , . . . , cntn }). f is then the optimal fragment to start with.
 2. Fetch all remaining pages of the triple pattern fragment f . For each triple t in
    the ldf, generate the solution mapping µt such that µt (tp ) = t. Compose
    the subpattern Bt = {tp | tp = µt (tpj ) ∧ tpj ∈ B} \ {t}. If Bt 6= ∅, find
    mappings ΩBt by recursively calling the algorithm for Bt . Else, ΩBt = {µ∅ }
    with µ∅ the empty mapping.
 3. Return all solution mappings µ ∈ {µt ∪ µ0 | µ0 ∈ ΩBt }.

4    Demo of Client-side Querying
The above recursive algorithm has been implemented by a dynamic pipeline
of iterators [6]. At the deepest level, a client uses TriplePatternIterators to
retrieve pages of triple pattern fragments from the server, turning the triples on
those pages into bindings. A basic graph pattern of a sparql query is evaluated
by a GraphPatternIterator, which first discovers the triple pattern in this graph
with the lowest number of matches by fetching the first page of the corresponding
triple pattern fragment. Then, TriplePatternIterators are recursively chained
together in the optimal order, which is chosen dynamically based on the number
of matches for each binding. More specific iterators enable other sparql features.
4                                 Ruben Verborgh et al.




Fig. 1. The demo shows how Linked Data Fragments clients, such as Web browsers,
evaluate sparql queries over datasets offered as inexpensive triple pattern fragments.
In the above example, a user searches for artists born in Italian cities.

   This iterator-based approach has been implemented as a JavaScript appli-
cation (Fig. 1), to allow its usage on different platforms (standalone, library,
browser application). The source code of the client, and also of triple pattern frag-
ment servers, is freely available at https://github.com/LinkedDataFragments/.
The versatility and efficiency of client-side querying is demonstrated through the
Web application http://client.linkeddatafragments.org, which allows users
to execute arbitrary sparql queries over triple pattern fragments. That way,
participants experience first-hand how low-cost Linked Data publishing solutions
can still enable efficient, realtime query execution over datasets on the Web.

References
1. Buil-Aranda, C., Hogan, A., Umbrich, J., Vandenbussche, P.Y.: sparql Web-
   querying infrastructure: Ready for action? In: Proceedings of the 12th International
   Semantic Web Conference (Nov 2013)
2. Fernández, J.D., Martínez-Prieto, M.A., Gutiérrez, C., Polleres, A., Arias, M.: Bi-
   nary rdf representation for publication and exchange (hdt). Journal of Web Se-
   mantics 19, 22–41 (Mar 2013)
3. Harris, S., Seaborne, A.: sparql . query language. Recommendation, wc (Mar
   2013), http://www.w3.org/TR/sparql11-query/
4. Hartig, O.: An overview on execution strategies for Linked Data queries. Datenbank-
   Spektrum 13(2), 89–99 (2013)
5. Speicher, S., Arwe, J., Malhotra, A.: Linked Data Platform 1.0. Working draft, wc
   (Mar 2014), http://www.w3.org/TR/2014/WD-ldp-20140311/
6. Verborgh, R., Hartig, O., De Meester, B., Haesendonck, G., De Vocht, L., Van-
   der Sande, M., Cyganiak, R., Colpaert, P., Mannens, E., Van de Walle, R.: Querying
   datasets on the Web with high availability. In: Proceedings of the 13th International
   Semantic Web Conference (Oct 2014)