Reducing the Network Load of Triple Pattern
              Fragments by Supporting Bind Joins

                            Olaf Hartig1 and Carlos Buil-Aranda2
    1
        Department of Computer and Information Science (IDA), Linköping University, Sweden
                                   olaf.hartig@liu.se
           2
              Informatics Department, Universidad Técnica Federico Santa Marı́a, Chile
                                   cbuil@inf.utfsm.cl


1       Introduction
The recently proposed Triple Pattern Fragment (TPF) interface aims at increasing the
availability of Web-queryable RDF datasets [3]. To this end, the TPF proposal trades
off an increased client-side query processing effort for a significant reduction of server
load. However, an additional aspect of this trade-off is a very high network load. To
mitigate this drawback we propose to extend the interface by augmenting TPF requests
with an optional VALUES clause as introduced in SPARQL 1.1. This extension enables
clients to attach intermediate results to triple pattern requests. The response to such a
request is expected to contain triples from the underlying dataset that do not only match
the given triple pattern, but that also are guaranteed to contribute in a join with the given
intermediate result. Hence, given such an extended interface—which we call Bindings-
Restricted Triple Pattern Fragments (brTPF)—the execution of joins can be distributed
between client and server by using the well-known bind join strategy [1].
     In an ongoing research project we study the trade-offs of the brTPF interface and
compare it to the pure TPF interface. With a poster in the conference we aim to present
initial results of this research. In particular, we would like to present a series of ex-
periments showing that distributed, bind-join-based query executions using the brTPF
interface reduce the network load drastically. In the remainder of this extended abstract,
we describe these experiments, discuss their results, and elaborate on the open research
questions that will drive the next steps of our ongoing work regarding brTPF. All digital
artifacts for our experiments are available online at http://olafhartig.de/brTPF-ISWC2016.

2       Experimental Setup
We begin by describing the metrics and the setup of our experiments. The goal of these
experiments is to compare TPF and brTPF in terms of the network load that the inter-
faces may cause when accessed by clients that execute SPARQL queries.
Metrics: For the comparison we focus on two metrics: First, the number of re-
quests (#req) that a client sends to the server during the execution of a query. Since both
the TPF interface and the brTPF interface split fragments into pages, the measurements
for #req do not correspond to the number of fragments requested during query execu-
tions but to the number of pages requested for the fragments that the client choses to
access. The second metric that we focus on is the amount of data received (dataRecv) by
the client during query executions. We measure dataRecv in terms of the number of RDF
triples contained in all fragment pages that the client receives during a query execution.
Evaluation Prototypes: As the server component in our experiments, we used an
established Java servlet implementation of the TPF interface and extended it with the
functionality to also support brTPF. The actual implementation approach used for the
latter is not relevant for the results that we present below. However, we emphasize that
the brTPF server specifies an upper bound on the number of solution mappings that can
be sent with any brTPF request. Hereafter, we refer to the upper bound as maxM/R.
     As a pure TPF client implementation, we used a JavaScript implementation of the
TPF-based query execution algorithm for SPARQL basic graph patterns (BGP) as pro-
posed by Verborgh et al. [3]. This algorithm is based on iterators that are arranged in
pipelines. Query results are computed recursively by executing the pipelines. Each of
these pipelines is generated for a subquery obtained from a decomposition of the ini-
tial BGP. Each iterator executes one of these subqueries returning as well an estimation
of the size of its response. The algorithm uses this estimation to adapt its execution
dynamically so that subqueries with a smaller result are executed first.
     For the brTPF client we implemented a simple bind-join-based query execution al-
gorithm. In contrast to the comparably sophisticated adaptive algorithm used by the TPF
client, the brTPF algorithm is kept deliberately straightforward. That is, this algorithm
simply choses a fixed query execution plan upfront. This plan represents a left-deep join
tree that is implemented using a fixed pipeline of iterators such that each of these iter-
ators is responsible for a different triple pattern of the query. The join order is decided
based on result cardinality estimates for every triple pattern of the query. During query
execution, every iterator receives chunks of solution mappings from its predecessor.
The size of these chunks corresponds to the value of maxM/R as specified by the brTPF
server. Given such a chunk, the iterator issues a brTPF request consisting of the triple
pattern that the iterator is responsible for and the solution mappings from the chunk.
Upon arrival of the data for the requested brTPF, the iterator uses this data to generate
chunks of solution mappings for the next iterator in the pipeline.
Benchmark: For the experiments we used the DBpedia 3.5.1 dataset and a sequence of
100 BGP queries that we generated for this dataset by using the FEASIBLE benchmark
generator [2]. FEASIBLE generated these queries by mimicking features that were ex-
tracted from real user queries in the log files of the DBpedia SPARQL endpoint [2].
Experimental Environment: We conducted the experiments using a single-machine
setup with a single client. That is, the combined TPF/brTPF Java servlet (with DBpedia)
is deployed on the same machine on which the client implementation performs the
sequence of query executions (using either the TPF algorithm or the brTPF algorithm).

3   Experimental Results
For our first experiment we use a page size of 100 data triples per fragment page and
execute the query sequence using the TPF client and the brTPF client, respectively. For
the latter we repeat the execution of the query sequence using a maxM/R of 5, 10, ... , 45,
and 50. The charts in Figure 1(a) and 1(b) provide an aggregated view on the resulting
measurements. In particular, Figure 1(a) illustrates the overall #req summed up for each
client over the whole sequence of queries, respectively; similarly, Figure 1(b) illustrates
the sums of the dataRecv measurements obtained for all queries, respectively.
    Regarding brTPF, we observe that the overall #req decreases with an increasing
value for maxM/R, and so does the overall dataRecv. While for #req this observation
is not surprising (if the fraction of any large intermediate result that can be sent with
each request is smaller, the brTPF client has to send more such requests), for dataRecv
we explain the observation by the fact that each fragment page contains not only data
triples but also additional metadata triples [3]. Therefore, if the number of fragment
pages requested and received is greater (as is the case for a smaller maxM/R), then so is
the overall number of these additional triples that have to be received with each page.
    By now comparing the behavior of TPF vs. brTPF in the charts in Figures 1(a)–1(b),
we notice that for both the overall #req and the overall dataRecv, brTPF achieves signifi-
cantly smaller values. At this point, we have to recall that these charts only show aggre-
gated measurements. Hence, it might still be possible that the vastly superior behavior
of brTPF as shown in these charts is actually only due to a small number of outliers.
We can verify that this is not the case by drilling into the measurements: For the dif-
ferent values of maxM/R, Figure 1(c) illustrates the number of queries for which brTPF
has a smaller (i.e., better) or greater (i.e., worse) #req than TPF. Figure 1(d) presents a
corresponding comparison for dataRecv. In Figures 1(e)–1(f), we drill in even deeper
for maxM/R=30 (corresponding charts for the other values of maxM/R look similar) and


            (a) sum of all #req                          (b) sum of all dataRecv


(c) number of queries for which brTPF has a    (d) number of queries for which brTPF has a
better (resp. same or worse) #req than TPF     better (or same, or worse) dataRecv than TPF


(e) breakdown of the number of queries in      (f) breakdown of the number of queries in
terms of the differences between #req of       terms of the differences between dataRecv of
brTPF (maxM/R=30) and of TPF                   brTPF (maxM/R=30) and of TPF

 Figure 1. Measurements of network-related metrics using FEASIBLE queries over DBpedia.
report the number of queries for which the difference between the #req (resp. dataRecv)
of brTPF vs. TPF is between 100K to 10K, between 10K to 1K, etc. These charts show
that, in terms of both #req and dataRecv, brTPF is not only better than TPF in an im-
pressively high number of cases, but for a large majority of these cases in which brTPF
is better, the differences are significant.
     To investigate whether these results are different for a different page size we con-
ducted another experiment in which we varied the page size (number of data triples per
fragment page). That is, with both the TPF client and the brTPF client, we repeated
the execution of the query sequence for different page sizes (up to 2000). Due to space
limitations, we do not include charts for this experiment in this paper. However, we
highlight that the measurements obtained by this experiment show that, for both brTPF
and TPF, the page size does not have any considerable impact on #req or on dataRecv.
In other words, the relative differences between brTPF and TPF as identified by the
first experiment are independent of the page size (and so are the relative differences
between the different maxM/R configurations for brTPF). Hence, our main conclusion
from these experiments is that, independent of the page size (and the value of maxM/R),
brTPF typically achieves a significantly smaller #req and dataRecv than TPF.

4   Future Directions
While brTPF can be used as an alternative to TPF to achieve a drastic reduction of
network load, this advantage does not come for free: In comparison to a TPF request,
responding to a brTPF request imposes more server-side work, which might have a neg-
ative impact on the throughput of a brTPF server. On the other hand, when accessed by a
brTPF-aware client, the number of requests that a brTPF server has to answer is signif-
icantly smaller than the throughput that a TPF server has to deliver (recall Figures 1(a),
1(c), and 1(e)). As a consequence, considering the whole client-server system, both ap-
proaches, brTPF and TPF, might achieve a comparable overall throughput (in terms of
full SPARQL queries executed within a given time frame). However, this is speculation.
Therefore, in the next step of our ongoing investigation of the trade-offs of TPF versus
brTPF, we will focus on throughput-related properties. To this end, we will run large-
scale experiments with an increasing number of concurrent clients.
     An important aspect of considering the throughput of approaches such as brTPF
and TPF is the degree to which the server load may be reduced by HTTP caching. We
assume that TPF would benefit more from caching than brTPF because it seems more
likely that different TPF-based query executions issue the same TPF requests than it is
for different brTPF-based executions to issue some identical brTPF requests. We will
conduct experiments to test this assumption and to identify how caching affects the
performance of both approaches.

References
1. Haas, L.M., Kossmann, D., Wimmers, E.L., Yang, J.: Optimizing Queries Across Diverse
   Data Sources. In: Proc. of the 23rd Int. Conference on Very Large Data Bases (VLDB) (1997)
2. Saleem, M., Mehmood, Q., Ngomo, A.N.: FEASIBLE: A Feature-Based SPARQL Bench-
   mark Generation Framework. In: Proc. of the 14th Int. Semantic Web Conf. (ISWC) (2015)
3. Verborgh, R., Vander Sande, M., Hartig, O., et al.: Triple Pattern Fragments: a Low-cost
   Knowledge Graph Interface for the Web. Journal of Web Semantics 37–38, 184–206 (2016)