<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards E cient Query Processing over Heterogeneous RDF Interfaces?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Gabriela Montoya</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Aebeloe</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katja Hose</string-name>
          <email>khoseg@cs.aau.dk</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalborg University</institution>
          ,
          <country country="DK">Denmark</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Since the proposal of RDF as a standard for representing statements about entities, diverse interfaces to publish and strategies to query RDF data have been proposed. Although some recent proposals are aware of the advantages and disadvantages of state-of-the-art approaches, no work has yet tried to integrate them into a hybrid system that exploits their, in many cases, complementary strengths to process queries more e ciently than each of these approaches could do individually. In this paper, we present hybridSE, an approach that exploits the diverse characteristics of queryable RDF interfaces to e ciently process SPARQL queries. We present a brief study of the characteristics of some of the most popular RDF interfaces (brTPF and SPARQL endpoints), a method to estimate the impact of using a particular interface on query evaluation, and a method to use multiple interfaces to e ciently process a query. Our experiments, using a well-known benchmark dataset and a large number of queries, with result sizes varying from 1 up to 1 million, show that hybridSE processes queries up to three orders of magnitude faster and transfers up to four orders of magnitude less data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        E ciently evaluating SPARQL queries over heterogeneous RDF interfaces, such
as SPARQL endpoints and Triple Pattern Fragments (TPFs) [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], requires
considering their individual characteristics through all steps of query optimization
and query processing. For example, some interfaces exhibit their best
performance for answering basic queries without joins (triple pattern queries) while
other interfaces exhibit their best performance for answering queries with
multiple joins (BGP queries). However, answering SPARQL queries is expensive in
general. Even if only join, lter, and union operators are considered, deciding
whether a result mapping is an answer to a query is already NP-complete [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ].
A well-known heuristic for executing joins is reducing the sizes of
intermediate results, in particular when they have to be transferred between clients and
servers. As in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], this paper focuses on BGP queries to ease the presentation
of the proposed approach and provide insightful empirical results. However, our
approach and the gained insights are also applicable to more complex fragments
of the SPARQL query language.
      </p>
      <p>In this paper, we examine and show how the advantages of available queryable
RDF interfaces can be exploited to nd an execution plan that makes the best
use of the strengths of available interfaces. We propose hybridSE (hybrid
SPARQL Engine), a smart client that is aware of alternative interfaces for the
same dataset, exploits their strengths to evaluate di erent parts of a query and
combines partial answers to produce the nal result.</p>
      <p>In summary, this paper makes the following contributions:
Characterization of existing queryable RDF interfaces: SPARQL endpoints,
triple pattern fragments, and bindings-restricted triple pattern fragments
Estimation of the impact of di erent RDF interfaces on query execution
Method to compute good execution plans exploiting advantages of di erent
RDF interfaces</p>
      <p>
        Extensive evaluation using the well-known WatDiv benchmark [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]
This paper is organized as follows. Section 2 presents related work, Section 3
characterizes di erent RDF interfaces, Section 4 discusses the impact of RDF
interfaces on query execution, Section 5 sketches hybridSE, Section 6 discusses
our experimental results, and nally Section 7 concludes the paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        The basic idea of exploiting multiple data sources and interfaces has been
considered in the database community in the context of multistore systems or
polystores [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These systems enable querying di erent systems (RDBMS, NoSQL,
HDFS, etc.) depending on what would be the most e cient paradigm for a
particular query.
      </p>
      <p>
        As with other data models, SPARQL queries can be evaluated using
different interfaces, such as SPARQL endpoints, link traversal over RDF
documents [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], Triple Pattern Fragments [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], or bindings-restricted Triple Pattern
Fragments [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. These interfaces, in combination with di erent fragments of the
SPARQL language, represent restrictions on expressiveness. They have di erent
purposes and strengths, and the most optimal interface can vary depending on
the speci c characteristics of a particular query and the user's needs.
      </p>
      <p>
        Recently, Hartig et al. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] have explored the expressiveness of using di erent
LDF interfaces in combination with di erent clients from a theoretical point of
view. Even though this work mentions the advantages of using some interfaces
over others for a given class of queries and regarding certain metrics, its
practical use remains limited. For instance, although using a SPARQL endpoint to
evaluate a complete query is the best in terms of the number of requests and the
amount of transferred data, having multiple clients choosing to evaluate their
queries using a particular interface can decrease the query response time and
deteriorate the user satisfaction. To that end, using an approach like multistores
in the context of the Semantic Web could be bene cial in delivering the fastest
query execution plan. Although multistores provide a more diverse query
interface, hybridSE still relies on the same data model (RDF) but focuses on stores
with di erent and in many cases complementary strengths.
      </p>
      <p>
        E ciently querying multiple homogeneous interfaces that provide access to
di erent datasets, i.e., processing federated SPARQL queries, has been
extensively studied, in the context of SPARQL endpoints [1,3,7,11,12,17{19,22,23,25]
and LDF [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ]. Some of these works exploit knowledge about the data available
through the interfaces, for instance, estimating the cardinality of joining data
available through di erent interfaces [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] or pruning sources that only provide
redundant data [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. In contrast to these techniques, the work presented in this
paper considers heterogeneous interfaces providing access to the same dataset.
However, some of the query optimization strategies developed in the contexted of
federated SPARQL queries could be combined with our approach. For instance,
in combination with the optimizations proposed in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ], the heterogeneous
interfaces may provide access to di erent, possibly overlapping, fragments of datasets.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Characteristics of RDF Interfaces</title>
      <p>In this section, we brie y discuss the main characteristics of some of the most
popular queryable RDF interfaces.</p>
      <sec id="sec-3-1">
        <title>3.1 SPARQL Endpoints</title>
        <p>SPARQL endpoints are speci cally designed to e ciently query RDF datasets.
Having access to the whole dataset in advance allows for computing indexes and
enables the use of tailored physical operators according to the characteristics of
the data. Therefore, SPARQL endpoints can, for instance, rely on cost functions
to determine whether a symmetric hash join is more suitable than a bound join
to evaluate a join in a given query.</p>
        <p>
          However, being able to evaluate any (or almost any) SPARQL query comes
at a cost. Query optimization in consideration of all available alternatives
(indexes, auxiliary structures, join order, join algorithm, etc.) can represent an
unnecessary overhead for relatively simple queries, for which a straightforward
optimization without considering alternatives would result in a su ciently good
query execution plan. Moreover, in some cases cost functions rely on low
quality cardinality estimations and lead to produce execution plans with very large
execution times [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Triple Pattern Fragment (TPF)</title>
        <p>
          Verborgh et al. [
          <xref ref-type="bibr" rid="ref26">26</xref>
          ] proposed the Triple Pattern Fragment (TPF) interface to
reduce query execution costs and distribute costs among server and clients. Using
TPF, the server is only responsible to provide answers to triple pattern queries,
i.e., queries composed of a single triple pattern, and the client is responsible for
processing operators such as join, union, and optional. Due to the reduction of
the load at the servers, these interfaces allow for a better scalability of services
providing access to RDF sources.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3 Bindings-Restricted Triple Pattern Fragments (brTPF)</title>
        <p>
          Even if the costs of providing access to RDF data is considerably reduced by
using TPF interfaces, the paradigm used by TPF leads to less e cient query
executions [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. In particular, the fact that the evaluation of joins relies on the
evaluation of triple patterns by the server and the subsequent construction of
new triple pattern queries with the bindings obtained from the previous triple
pattern evaluation, leads to a large amount of data transferred between server
and clients (in both directions). Therefore, Hartig and Buil Aranda [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] proposed
an extension of the TPF interface that allows to include, in addition to a triple
pattern, bindings in the queries sent to the server. This extension allows for a
signi cant reduction in the number of requests and the amount of data transferred
from the server to the clients, without decreasing the query throughput [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
3.4
        </p>
      </sec>
      <sec id="sec-3-4">
        <title>Overview</title>
        <p>
          Table 1 shows an overview of the studied interfaces. For powerful servers and
low numbers of clients, SPARQL endpoints represent the best option to e
ciently process queries as they transfer the lowest amount of data and have no
requirement on the client side resources. For very high numbers of clients with
no requirements regarding answer time and some local resources available for
query processing, the TPF interface o ers the best option. A server can respond
to a very large number of simple queries while the clients take over most of
the query processing tasks. Lastly, if clients and servers have resources available
for query processing, brTPF is the best option. As computational power spent
by brTPF servers to process requests is between SPARQL endpoints' and TPF
servers', brTPF servers can answer more requests. Moreover, as these requests
reduce the network tra c, query results are obtained faster.
In this paper, we assume that the most expressive interface, i.e., SPARQL
endpoints, takes advantage of having access to server-side resources to e ciently
process queries by relying on indexes. We further assume that the less expressive
interfaces, i.e., TPF [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], are used in combination with back-ends that provide
fast access to triple pattern fragments and good estimations of the number of
triples (cardinality) matching a triple pattern. While these assumptions clearly
impose some restrictions about the implementation of the RDF interfaces, they
hold for commonly used implementations, such as Virtuoso endpoints and TPF
with HDT backends [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Validating our conclusions using unconventional
implementations is out of the scope of this paper.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Interface Impact on Query Evaluation E</title>
    </sec>
    <sec id="sec-5">
      <title>Resource Usage ciency and</title>
      <p>Query processing time mainly depends on three aspects: (i) query processing
time at the server, (ii) transfer time of (intermediate) results from servers to
clients, and (iii) query processing time at the client. While all aspects are in
uenced by the size of intermediate results, aspects (i) and (iii) are also in uenced
by the data structures used to store intermediate results during query
processing, e.g., available indexes and the complexity of the algorithms used to process
the query.</p>
      <p>A restricted interface, such as TPF, has very low query processing time at
the server, transfer time depends on the sizes of intermediate results, and query
processing time at the client can be signi cantly high depending on the selectivity
of the triple patterns in the query. Moreover, queries with high numbers of
intermediate results lead to a high number of requests to the TPF server during
query processing, which can signi cantly slow down execution.</p>
      <p>A SPARQL endpoint has a signi cantly higher query processing time at the
server, transfer time depends on the query result size, and query processing time
at the client is very low (and usually negligible).</p>
      <p>If the SPARQL endpoint had enough query processing power to instantly
answer the queries of all users, then the best possible approach to e ciently
process queries would be to use the SPARQL endpoint in all cases. However,
the available query processing power of SPARQL endpoints is limited and this
processing power should be used mainly in cases when using other interfaces
would incur in signi cantly higher query processing times in order to improve
the overall e ciency of query processing for all clients.</p>
      <p>
        Knowing the sizes of intermediate and query results could help deciding which
interface to use to execute the queries. While computing this knowledge before
execution time has been addressed in diverse works [
        <xref ref-type="bibr" rid="ref10 ref18 ref25 ref5 ref8">5, 8, 10, 18, 25</xref>
        ], hybridSE
uses a simpler and more straightforward approach that relies on available
metadata provided by the TPF servers and heuristics. It therefore does not entail
any additional computational costs. In particular, we can use the cardinality of
a triple pattern fragment, i.e., number of triples in the triple pattern fragment,
as a generally good enough estimate is provided by TPF servers with HDT
backend, whenever a triple pattern fragment is retrieved. While processing a query
using a brTPF client, fragments and their cardinality estimation, are retrieved.
We propose to use the fragment cardinality estimation to decide if continuing
using brTPF is a good choice, or if using the SPARQL endpoint is likely to be
more e cient. If the estimated fragment cardinality is low, then using a very
simple plan, produced by the brTPF client, and retrieving the data from the
brTPF server is usually a good enough approach. However, if the estimated
fragment cardinality is high, then exploiting existing information and structures
used by SPARQL endpoints will most likely represent a signi cant advantage in
terms of query execution time. We use a threshold value to assess if the fragment
cardinality is low or high.
      </p>
      <p>
        To illustrate the impact of the interfaces on query e ciency and resource
usage, we use queries generated using the WatDiv benchmark [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] query generator
and a 10 million triples dataset. Example queries A and B are presented in
Listings 1.1 and 1.2. Table 2a shows the estimated fragment cardinalities of
the triple patterns in these queries and Table 2b shows the query execution
times and numbers of transferred bytes from the brTPF server and the Virtuoso
endpoint (version 7.2.4.2); these queries were run in setups with 4 and 16 clients.1
Listing 1.1: Query A: Find purchases of friends of reviewers who reviewed
products liked by the users followed by \User1204"
SELECT * WHERE {
wsdbm : User1204 wsdbm : follows ? v1 . ( tp1 )
? v1 wsdbm : likes ? v2 . ( tp2 )
? v2 rev : hasReview ? v3 . ( tp3 )
? v3 rev : reviewer ? v4 . ( tp4 )
? v4 wsdbm : friendOf ? v5 . ( tp5 )
? v5 wsdbm : makesPurchase ? v6 ( tp6 )
}
Listing 1.2: Query B: For products that can be delivered in country \Country4",
output their prices and information about their retailers
SELECT * WHERE {
? v1 schema : eligibleRegion wsdbm : Country4 . ( tp1 )
? v0 goodrelations : offers ? v1 . ( tp2 )
? v1 goodrelations : price ? v2 . ( tp3 )
? v0 schema : contactPoint ? v3 . ( tp4 )
? v0 schema : email ? v5 . ( tp5 )
? v0 schema : legalName ? v6 . ( tp6 )
? v0 schema : openingHours ? v7 . ( tp7 )
? v0 schema : paymentAccepted ? v8 ( tp8 )
}
      </p>
      <p>
        Query A (Listing 1.1) has one small fragment, tp1, and joins on di erent
variables. Thus, Query A is evaluated more e ciently by brTPF, which results in
a relatively low amount of data transfer in comparison to the number of results,
while the endpoint relies on cardinality estimations that are deteriorated by
the di erent object-subject joins and the unsatis ed assumption of query triple
pattern independence [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>Query B (Listing1.2) involves only fragments with relatively large estimated
cardinalities and many subject-subject joins on the same variables. Then,
using brTPF results in relatively high amounts of data transfer in comparison to
the number of results. Virtuoso is able to exploit existing indexes to e ciently
process Query B outperforming brTPF for this particular query.</p>
      <p>As the number of calls to the servers increases with the number of clients
issuing queries, the servers represent bottlenecks, which can result in higher
response times.
1 In these setups, described in Section 6, each client executes around 200 di erent
queries. Queries A and B are two example queries executed by one of the clients.</p>
    </sec>
    <sec id="sec-6">
      <title>E cient Query Execution over Available Interfaces</title>
      <p>Algorithm 1 Evaluate BGP</p>
      <p>Input: sorted triple patterns tps in the BGP; set of initial mappings ms; threshold
that guides the choice of interface</p>
      <p>Output: A set of answer mappings (ms) to BGP tps
1: function evaluateBGP (tps,ms,threshold)
2: if size(tps)=0 then
3: return ms
4: end if
5: if estimatedFragmentCardinality( rst(tps)) &gt; threshold ^ size(tps) &gt; 1 then
6: ms' evaluateBGPAtSPARQLEndpoint(tps, ms)
7: ms ms' ./ ms
8: else
9: rst rst(tps)
10: tps removeFirst(tps)
11: ms' evaluateTPAtBrTPFServer( rst, ms)
12: ms ms' ./ ms
13: ms evaluateBGP(tps, ms,threshold)
14: end if
15: return ms
16: end function</p>
      <p>The goal of hybridSE is to use a combination of multiple interfaces to
reduce the overall query execution time. The simpler interface should be used for
relatively simple computations, and if we already know there are not many
intermediate results, then an interface like brTPF is the best. However, in some cases
applying the simpler query execution approach and using the simpler interface
leads to very expensive query execution times. In such cases, we should make
use of more costly interfaces that are able to exploit knowledge about the data
to produce better execution plans.</p>
      <p>Algorithm 1 sketches how hybridSE implements the aforementioned
heuristics. The list of triple patterns and their order is described in tps. Hence, to
achieve e cient query processing, tps should be sorted according to selectivity
and this can be based on several heuristics. The order used in this paper is as
follows: The rst triple pattern is the one with the lowest estimated fragment
cardinality. The second triple pattern is the one with the greater number of
bound variables, when considering the bound variables after the evaluation of
the rst triple pattern, and so on. As such, it is more likely that fragment
cardinalities will be reduced as the number of bound variables increases. Note that
it is possible to use any other execution order for triple patterns, e.g., updating
the order of the triple patterns after the evaluation of each triple pattern query
at the brTPF server.</p>
      <p>For Query A (Listing1.1), the rst triple pattern is tp1 with an estimated
fragment cardinality of 47. The next one to execute is tp2 because after
evaluation of tp1, we know the bindings for variable ?v1 and can use them to obtain
a smaller fragment for triple pattern tp2.</p>
      <p>The estimated fragment cardinality of the triple pattern with higher
selectivity is used to determine whether brTPF should be used or if it is
necessary to use a more expressive interface, i.e., the SPARQL endpoint (Line 5),
as we consider that the triple patterns in tps are sorted according their
selectivity this is the rst triple pattern in tps. In any case, the previously
obtained mappings are passed on to the function evaluating the graph
pattern (BGP or TP), line 6 or 11, and only the mapping values that are
relevant for the pattern evaluation are included in the request sent to the server.
Listing 1.3: Query C: Find the purchases of friends of the reviewers of a set of
products
SELECT * WHERE {</p>
      <p>VALUES (? v2 ) {
( wsdbm : P r o d u c t 2 4 9 5 7 ) ( wsdbm : P r o d u c t 2 0 0 0 3 )
( wsdbm : P r o d u c t 1 4 3 9 1 ) ( wsdbm : P r o d u c t 1 8 0 3 9 )
( wsdbm : P r o d u c t 1 9 3 3 3 ) ( wsdbm : P r o d u c t 1 8 6 8 7 )
( wsdbm : P r o d u c t 1 0 4 1 5 ) ( wsdbm : P r o d u c t 1 3 6 8 3 )
( wsdbm : P r o d u c t 1 5 5 0 5 ) ( wsdbm : P r o d u c t 2 0 2 )
( wsdbm : P r o d u c t 1 2 6 2 )
}
}
? v2 rev : ha sR ev ie w ? v3 .
? v3 rev : reviewer ? v4 .
? v4 wsdbm : friendOf ? v5 .
? v5 wsdbm : m a k e s P u r c h a s e ? v6 .</p>
      <p>For Query A (Listing1.1) the rst triple pattern (tp1 ) is evaluated rst and
the 46 values of ?v1 are used to evaluate tp2 using the brTPF interface; the 46
values are used to build two brTPF requests, one with 30 bindings and another
one with 16 bindings, considering a maximum of 30 bindings per request. As
the resulting fragments have estimated cardinalities of 26 and 15, the evaluation
continues using brTPF. The 41 values obtained for ?v2 are used to build two
brTPF requests to evaluate tp3. As the retrieved fragments have estimated
fragment cardinalities of 489 and 113, thresholds of up to 112 result in evaluating
the triple patterns f tp3 . tp4 . tp5 . tp6 g using the Virtuoso endpoint. As
such, the values for ?v2 are passed on to the Virtuoso endpoint in a VALUES
clause, as shown in Listing 1.3, rather than to the brTPF server. Notice that
the di erence between the estimated fragment cardinality, 15, and the actual
fragment cardinality, 11, used to generated Query C, cf. Listing 1.3, is relatively
low. Likewise, the obtained 30 bindings from the other brTPF request, also close
to the estimated fragment cardinality of 26, are used to build a similar request
for the Virtuoso endpoint.</p>
      <p>Table 3 shows the evaluation metrics for hybridSE obtained when executing
queries A and B (Listings 1.1 and 1.2) in the setups with 4 and 16 clients
introduced in Section 4, and used to evaluate brTPF and endpoint (values in
Table 2b). While Query A has been evaluated as described above exploiting
the advantages of both brTPF and endpoint, the execution of Query B relies
exclusively on the endpoint. Using hybridSE, Query A is executed considerably
more e cient, at least two times faster for the setup with 4 clients and 6 times
faster for the setup with 16 clients, and achieves a similar performance as a
SPARQL endpoint for Query B.
To evidence the potential of hybridSE, we have implemented a prototype to
evaluate BGP queries using a combination of SPARQL endpoints and brTPF
servers. The prototype uses Algorithm 1 described in Section 5 to exploit the
advantages of each RDF interface.</p>
      <p>
        Dataset and queries: We use the WatDiv benchmark [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] query generator and
a generated 10 million triples dataset to conduct an experiment with multiple
clients { similarly as done in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. To show that hybridSE works well with a
wide range of queries, we use queries with di erent result sizes2, i.e., queries with
result sizes in the ranges (1,102], (102, 103], (103, 104], (104, 105], and (105, 106].
We consider up to 32 clients, each client having around 200 di erent queries 3
2 Queries are available on our website http://qweb.cs.aau.dk/hybridse
3 The actual number of queries per client is between 198 and 203. Some clients have
less queries because the generator failed to produce enough di erent queries within
the largest result size range.
to execute, i.e., in total around 6,400 queries are executed in the setup with 32
clients. While clients run in parallel, each client executes its queries sequentially,
i.e., in a given instant, there are up to 32 queries executed in parallel.
Implementation: The SPARQL endpoint was deployed using Virtuoso 7.2.4.2.
We used the brTPF client and server provided by its authors [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]4. hybridSE
is implemented in Node.js on top of the brTPF client implementation and it is
available on our website http://qweb.cs.aau.dk/hybridse.
      </p>
      <p>Hardware con guration: For our experiments we used virtual machines
(VMs) running Ubuntu Linux 4.4.0 64bit x86 with eight cores and CPU 2294.250
MHz with a network speed of up to 10,000Mb/s. A dedicated VM was used to
deploy the servers. A Virtuoso 7.2.4.2 endpoint was set up to use up to 7GB
of RAM, 1 million result set rows, 6,000s estimated execution time, and 600s
execution time. A brTPF server was set up to use up to 8GB of RAM. Up to
four VMs, with 16GB of RAM, were used to run clients for the experiments.
Each (client) VM ran up to eight clients simultaneously.</p>
      <p>Evaluation metrics:
1. Execution Time (ET) is the time elapsed since the query is issued until the
complete answer is produced (with a timeout of 300,000 ms),
2. Number of Transferred Bytes (NTB) is the amount of data transferred from
the brTPF server or the SPARQL endpoint to the query engine during query
evaluation. It includes the Total Number of Transferred Triples from the
brTPF server (TNTTTPF) and the Number of (sub)query Results obtained
from the SPARQL Endpoint (NRSE). NTB is computed as the sum of bytes
in the string representations of TNTTTPF and NRSE.
3. Number of Calls to the brTPF server (NCTPF) is the number of (sub)queries
sent to the brTPF server.
4. Number of Calls to the SPARQL Endpoint (NCSE) is the number of
(sub)queries sent to the SPARQL endpoint.
5. Throughput (T): is the number of queries executed by all the clients in one
time unit. It is computed as the number of queries executed by the clients
divided by the total execution time.
6.1</p>
      <sec id="sec-6-1">
        <title>Experimental Results</title>
        <p>To evaluate the performance of hybridSE, we compare its results to the
SPARQL endpoint and the brTPF client. We compare the performance of the
WatDiv queries based on the aforementioned metrics. hybridSE has as input a
threshold value representing the turning point where the least expressive
interface should be switch to the most expressive one. In Algorithm 1, a threshold is
used to choose which interface should be used to avoid ine cient processing of
4 As available at http://olafhartig.de/brTPF-ODBASE2016/ on January 2018. These
implementations have some limitations, as pointed out by Hartig in his review of our
work https://openreview.net/forum?id=BJeDc51elX, the client is purposely simple
and lacks adaptive query optimization techniques to exploit metadata obtained
during query processing and the cardinality estimations provided by the server can be
totally inaccurate.
the queries from all the clients. We tested the WatDiv queries with three di
erent thresholds: 25, 50, and 75. We also performed the WatDiv experiments with
di erent numbers of clients: 1, 4, 8, 16 and 32. However, due to space limitations,
we will only show results for 4 and 16 clients. After a comparison of hybridSE's
performance for the di erent thresholds, we will consider only threshold=50 for
the rest of the paper. Timed out queries are omitted from the plots5.
)
s
(m2e+05
e
m
i
T
n
o
itu1e+05
c
e
x
E
(a) Detailed runtime per query
Because our evaluation comprises a large number of queries and to ease the
readability of the results, we present the average values of the metrics for groups
of queries with similar amounts of transferred bytes (NTB). As each approach
might exhibit di erent NTB, we have considered for each query the average
NTB, ANTB, over the three approaches, and divided the queries into clusters
5 Fewer queries timed out using hybridSE, e.g., in the 16 clients setup, only 134 out
of 3,245 queries timed out, while 812 and 350 timed out using brTPF and endpoint.
according to ANTB, such that the results obtained for the three approaches
for each query are considered in the same cluster. Therefore, the comparison
over these clusters corresponds to comparing the approaches for the same set of
queries and eases the readability, i.e, in Figure 1b, instead of showing around
3,200 points per approach (one for each query), as in Figure 1a, we depict only
15 points (one per cluster), where each point represents the average value for
queries with ANTB in the interval [x*106,y*106) for the label [x,y) in the x-axis
of Figure 1b. Detailed results for all the setups and metrics are available at our
website http://qweb.cs.aau.dk/hybridse.</p>
        <p>Impact of the threshold on hybridSE's performance Figure 1b shows the
average execution time obtained when using di erent values of threshold in the
setup with 16 clients. We observe that as the threshold increases, the performance
of hybridSE improves in average. The di erence is quite small, having a very
close average over all the queries with just slightly worse performance for rare
queries when the threshold is 25.
Scalability of hybridSE Figure 2 shows the average execution time obtained
when using di erent values for the threshold in a setup with 4 clients, where
around 800 queries were executed. In comparison to Figure 1b, where around
3,200 queries were executed, we observe that even if the workload is multiplied
by four, the execution time increases only sub-linearly.</p>
        <p>Execution time Figure 3 shows the average execution time obtained for
hybridSE, brTPF and endpoint in the setup with 16 clients. Even if brTPF and
Endpoint faced challenging queries that have considerably degraded their
execution time for some groups of queries, hybridSE has exploited the advantages of
each of these approaches, i.e., relying on brTPF only for very simple subqueries
with one triple pattern or accessing relatively small triple pattern fragments
that contribute to reducing the di culty of the queries that are later sent to the
Fig. 4: Total Number of Transferred Bytes from the TPF server and the SPARQL
Endpoint (NTB), 16 clients setup for BrTPF, Endpoint and hybridSE
approaches (x-axis, *105)
Number of Transferred Bytes Figure 4 shows the average number of
transferred bytes for hybridSE and the brTPF and endpoint baselines in the setup
with 16 clients. Clearly the endpoint baseline transfers the lowest number of
bytes by transferring only the query results, and the brTPF baseline transfers
higher numbers of bytes because the used interface only evaluates triple pattern
queries with some binding values. hybridSE remains very close to the data
transfer incurred by the more e</p>
        <p>cient approach in this metric (endpoint),
therefore using hybridSE does not result in any considerable increase of the network
tra c with respect to the best possible alternative.</p>
        <p>● brTPF hybridSE
●
●
●
●
●
●
●
●
[0,2) [2,4) [4,6) [6,8) [8,10)[10,12)[12,14)[14,16)[16,18)[18,20)[20,22)[22,24)[24,26)[26,28)[28,30)[30,32)[32,34)</p>
        <p>Avg. Calls to the TPF server Range
(a) Total Number of Calls to the brTPF server</p>
        <p>endpoint hybridSE
r
e
rve104
s
F
P
T
the102 ●
o
t
s
ll
aC100
.
g
v
A
t 5
n
i
o
p 4
d
n
E 3
e
h
t 2
o
t
lls 1
a
.C 0
g
v
A−1
[0.5,0.[60).6,0.[70).7,0.[80).8,0.9)[0.9,1)[1,1.[11).1,1.[21).2,1.[31).3,1.[41).4,1.[51).5,1.[61).6,1.[71).7,1.[81).8,1.9)[1.9,2)[2,2.[12).1,2.[22).2,2.[32).3,2.[42).4,2.[52).5,2.6)</p>
        <p>Avg. Calls to the Endpoint Range
(b) Number of Calls to the SPARQL Endpoint</p>
        <p>System Throughput Figure 6 shows system throughput for hybridSE and
the brTPF and endpoint baselines in the setup with 16 clients. The throughput
of hybridSE is considerably higher than the other approaches. hybridSE is at
least ve times faster and up to 40 times faster than the baselines. Therefore, the
improvement is not only due to having more resources available (two interfaces
instead of one) but also due to making better use of these resources.
7</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Work</title>
      <p>In this paper, we have presented )30
hybridSE, an approach for exploit- m
/
ing the di erent capabilities of avail- q
able LDF interfaces to process queries t(#20
as e ciently as possible but with- phu
out unnecessary usage of resources. ug10
hybridSE uses available information, roh
such as the number of triples per frag- T 0
ment, to determine when it is more brTPF endpoint hybridSE
e cient to use the brTPF client or
send a (sub)query to the SPARQL Fig. 6: System throughput, 16 clients
endpoint. Our experimental evaluation
shows that hybridSE produces query execution plans that are better in terms
of data transfer and execution time than the available implementation for the
brTPF interface.</p>
      <p>
        As future work, it would be interesting to assess if the limitations of the
available brTPF implementation had a signi cant impact on our experimental
results and to extend hybridSE for more complex fragments of the SPARQL
query language. Additionally, we would like to combine hybridSE with other
interfaces, such as other TPF clients, e.g., [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Finally, we would like to integrate
our techniques into the Comunica platform [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Comunica allows to integrate
di erent approaches to process queries over a combination of diverse RDF
interfaces, however the diverse RDF interfaces are not yet exploited as alternative
interfaces for the same dataset. Therefore, integrating hybridSE into Comunica
will facilitate the use of diverse RDF interfaces that access the same datasets
and the use of the most recent implementations of the query engines hybridSE
relies upon.
      </p>
      <p>Acknowledgments. This research was partially funded by the Danish Council
for Independent Research (DFF) under grant agreement no. DFF-4093-00301
and Aalborg University's Talent Management Programme.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>M.</given-names>
            <surname>Acosta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lampo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Castillo</surname>
          </string-name>
          , and
          <string-name>
            <surname>E. Ruckhaus. ANAPSID:</surname>
          </string-name>
          <article-title>An Adaptive Query Processing Engine for SPARQL Endpoints</article-title>
          .
          <source>In ISWC'11</source>
          , pages
          <fpage>18</fpage>
          {
          <fpage>34</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>G.</given-names>
            <surname>Aluc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          , M. T. Ozsu, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Daudjee</surname>
          </string-name>
          .
          <article-title>Diversi ed Stress Testing of RDF Data Management Systems</article-title>
          .
          <source>In ISWC 2014</source>
          , pages
          <fpage>197</fpage>
          {
          <fpage>212</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>H.</given-names>
            <surname>Betz</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          <article-title>Gropengie er, K. Hose</article-title>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Sattler</surname>
          </string-name>
          .
          <article-title>Learning from the History of Distributed Query Processing { A Heretic View on Linked Data Management</article-title>
          .
          <source>In COLD'12</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bondiombouy</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Valduriez</surname>
          </string-name>
          .
          <article-title>Query processing in multistore systems: an overview</article-title>
          .
          <source>IJCC</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <volume>309</volume>
          {
          <fpage>346</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>A.</given-names>
            <surname>Charalambidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Troumpoukis</surname>
          </string-name>
          , and
          <string-name>
            <surname>S. Konstantopoulos.</surname>
          </string-name>
          <article-title>SemaGrow: Optimizing Federated SPARQL queries</article-title>
          .
          <source>In SEMANTICS'15</source>
          , pages
          <fpage>121</fpage>
          {
          <fpage>128</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Arias</surname>
          </string-name>
          .
          <article-title>Binary RDF representation for publication and exchange (HDT)</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>19</volume>
          :
          <fpage>22</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>O.</given-names>
            <surname>Go</surname>
          </string-name>
          <article-title>rlitz and</article-title>
          <string-name>
            <given-names>S.</given-names>
            <surname>Staab</surname>
          </string-name>
          .
          <article-title>Federated Data Management and Query Optimization for Linked Open Data</article-title>
          .
          <source>In New Directions in Web Data Management</source>
          <volume>1</volume>
          , pages
          <fpage>109</fpage>
          {
          <fpage>137</fpage>
          .
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>O.</given-names>
            <surname>Go</surname>
          </string-name>
          <article-title>rlitz and S. Staab. SPLENDID: SPARQL Endpoint Federation Exploiting VOID Descriptions</article-title>
          .
          <source>In COLD'11</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>A.</given-names>
            <surname>Gubichev</surname>
          </string-name>
          and
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          .
          <article-title>Exploiting the query structure for e cient join ordering in SPARQL queries</article-title>
          .
          <source>In EDBT'14</source>
          , pages
          <fpage>439</fpage>
          {
          <fpage>450</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>S.</given-names>
            <surname>Hagedorn</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sattler</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Umbrich</surname>
          </string-name>
          .
          <article-title>Resource Planning for SPARQL Query Execution on Data Sharing Platforms</article-title>
          .
          <source>In COLD</source>
          , pages
          <volume>49</volume>
          {
          <fpage>60</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          .
          <article-title>Database Techniques for Linked Data Management</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          .
          <article-title>Linked Data Management</article-title>
          . Chapman and Hall/CRC,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. B.</given-names>
            <surname>Aranda</surname>
          </string-name>
          .
          <article-title>Bindings-Restricted Triple Pattern Fragments</article-title>
          .
          <source>In OTM 2016 Conferences</source>
          , pages
          <volume>762</volume>
          {
          <fpage>779</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Freytag</surname>
          </string-name>
          .
          <article-title>Executing SPARQL Queries over the Web of Linked Data</article-title>
          .
          <source>In ISWC 2009</source>
          , pages
          <fpage>293</fpage>
          {
          <fpage>309</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Letter</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Perez</surname>
          </string-name>
          .
          <article-title>A Formal Framework for Comparing Linked Data Fragments</article-title>
          .
          <source>In ISWC 2017</source>
          , pages
          <fpage>364</fpage>
          {
          <fpage>382</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          , E. Mannens, and R. V. de Walle.
          <article-title>Query Execution Optimization for Clients of Triple Pattern Fragments</article-title>
          .
          <source>In ESWC</source>
          , pages
          <volume>302</volume>
          {
          <fpage>318</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <given-names>D.</given-names>
            <surname>Ibragimov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          , T. B.
          <string-name>
            <surname>Pedersen</surname>
            , and
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Zimanyi</surname>
          </string-name>
          .
          <article-title>Processing Aggregate Queries in a Federation of SPARQL Endpoints</article-title>
          .
          <source>In ESWC</source>
          , pages
          <volume>269</volume>
          {
          <fpage>285</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18. G. Montoya,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skaf-Molli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          .
          <article-title>The Odyssey Approach for Optimizing Federated SPARQL Queries</article-title>
          .
          <source>In ISWC</source>
          , pages
          <volume>471</volume>
          {
          <fpage>489</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19. G. Montoya,
          <string-name>
            <given-names>H.</given-names>
            <surname>Skaf-Molli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Molli</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Vidal</surname>
          </string-name>
          .
          <article-title>Decomposing federated queries in presence of replicated fragments</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>42</volume>
          :1{
          <fpage>18</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Moerkotte. Characteristic Sets</surname>
          </string-name>
          :
          <article-title>Accurate Cardinality Estimation for RDF Queries with Multiple Joins</article-title>
          .
          <source>In ICDE'11</source>
          , pages
          <fpage>984</fpage>
          {
          <fpage>994</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>J. Perez</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Arenas</surname>
            , and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Gutierrez</surname>
          </string-name>
          .
          <article-title>Semantics and complexity of SPARQL</article-title>
          .
          <source>ACM Trans. Database Syst</source>
          .,
          <volume>34</volume>
          (
          <issue>3</issue>
          ):
          <volume>16</volume>
          :1{
          <fpage>16</fpage>
          :
          <fpage>45</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <given-names>M.</given-names>
            <surname>Saleem</surname>
          </string-name>
          and
          <string-name>
            <given-names>A. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          . HiBISCuS:
          <article-title>Hypergraph-Based Source Selection for SPARQL Endpoint Federation</article-title>
          .
          <source>In ESWC</source>
          , pages
          <volume>176</volume>
          {
          <fpage>191</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>A.</given-names>
            <surname>Schwarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <source>FedX: Optimization Techniques for Federated Query Processing on Linked Data. In ISWC'11</source>
          , pages
          <fpage>601</fpage>
          {
          <fpage>616</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>R.</given-names>
            <surname>Taelman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          .
          <article-title>Comunica: a Modular SPARQL Query Engine for theWeb</article-title>
          .
          <source>In ISWC</source>
          ,
          <year>2018</year>
          . To appear.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>J. Umbrich</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Karnstedt</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Harth</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          .
          <article-title>Comparing data summaries for processing live queries over Linked Data</article-title>
          .
          <source>World Wide Web</source>
          ,
          <volume>14</volume>
          (
          <issue>5- 6</issue>
          ):
          <volume>495</volume>
          {
          <fpage>544</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <given-names>R.</given-names>
            <surname>Verborgh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. V.</given-names>
            <surname>Sande</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. V.</given-names>
            <surname>Herwegen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. D.</given-names>
            <surname>Vocht</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B. D.</given-names>
            <surname>Meester</surname>
          </string-name>
          , G. Haesendonck, and
          <string-name>
            <given-names>P.</given-names>
            <surname>Colpaert. Triple Pattern</surname>
          </string-name>
          <article-title>Fragments: A low-cost knowledge graph interface for the Web</article-title>
          . J. Web Sem.,
          <fpage>37</fpage>
          -
          <lpage>38</lpage>
          :
          <fpage>184</fpage>
          {
          <fpage>206</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>