<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scaling Out Federated Queries for Life Sciences Data in Production</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Dieter De Witte</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Laurens De Vocht</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kenny Knecht</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Filip Pattyn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hans Constandt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Erik Mannens</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ruben Verborgh</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontoforce</institution>
          ,
          <addr-line>Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>iMinds - IDLab - Ghent University</institution>
          ,
          <addr-line>Ghent</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>There exists an abundance of Linked Data storage solutions, but only few meet the requirements of a production environment with interlinked life sciences data. In such environments, a triple store has to support complex SPARQL queries and handle large datasets with hundreds of millions of triples. The Ontoforce platform DISQOVER o ers federated search for life sciences, relying on complex federated queries over open life science data. The queries correspond to user actions in its exploratory search interface. Di erent state-of-the-art approaches for scaling out are compared, both in terms of their ability to execute the queries as in terms of performance. This paper analyzes and discusses the features of the datasets and query mixes. An in-depth analysis is provided showing the features of the most challenging queries.</p>
      </abstract>
      <kwd-group>
        <kwd>Evaluation</kwd>
        <kwd>Life Sciences</kwd>
        <kwd>Big RDF Data</kwd>
        <kwd>Semantic Web Tools</kwd>
        <kwd>Distributed Querying</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Life Sciences is one of the successful application domains of semantic technology.
The Life Sciences domain is interdisciplinary, which makes interlinking data
sources interesting and crucial. The Linked Open Data Cloud contains many
RDF data sources related to life sciences, but this comes with a set of challenges:
(i) the union of all datasets quali es as Big Data and therefore puts a strain on
the available technologies for querying and (ii) these insights contain information
of multiple datasets at once, making the queries federated in nature. These
challenges are being addressed by:
An alternative is to opt for a distributed architecture:
{ Horizontal scaling uses multiple - often cheap, low-end - instances in a
distributed system. Most enterprise RDF stores support parallelization, but this
can imply both a high availability solution (data replication), or a sharded
system (data partitions) that can deal with increasingly large datasets.
{ Query federation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]: All datasets are hosted by their providers and a
federated query engine redirects the relevant parts of each query to the right
endpoint and nally combines all the recieved information to solve the query.
{ Native Big Data approaches typically map SPARQL queries to SQL
technologies available in the Hadoop stack: SparkSQL [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] or Impala [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Speci cally in the Life Sciences domain BioBenchmark Toyama 2012 [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] sheds
light on the capabilities of typical single-node RDF storage solutions. In this
work 5{10 queries are evaluated against 5 real datasets ranging from 10 million
to 8 billion triples. The novelty in our work lies in the fact that we are dealing
with a large set of 1,223 federated queries used in a production environment.
      </p>
      <p>
        Complementary to this work we evaluated 4 RDF databases [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] on a new
arti cial benchmark named WatDiv, which guarantees diversity both in terms
of query properties and dataset properties [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Other arti cial benchmarks
extensively used in the past are the Lehigh Universit Benchmark [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], the Berlin
SPARQL Benchmark [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], DBPBM[
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and the SP2Bench [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Speci cally for
NoSQL approaches to RDF Mauroux [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] evaluated the performance for RDF
data workloads.
      </p>
      <p>The results of this complementary work motivate the evaluation on a
realworld dataset. One big di erence with this benchmark as opposed to WatDiv is
the federated nature of the data and the queries, the explicit focus on scalability
and the fact that the current queryset is rich in SPARQL features, while WatDiv
focuses on pure Basic Graph Patterns (BGPs).</p>
      <p>
        The di culty in selecting and optimizing RDF systems is also being
addressed in two European H2020 projects: LDBC [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and HOBBIT [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. These
projects aim to create a platform to o er industry a uni ed approach for running
benchmarks related to their actual workloads.
1.2
      </p>
    </sec>
    <sec id="sec-3">
      <title>Our Contribution</title>
      <p>This paper demonstrates a methodology to evaluate RDF storage solutions on
a data and query set of choice. The goal of this work is to evaluate the ability of
today's triple stores in terms of scalability with big biomedical data sources and
complex real-world queries. The pitfalls in the interpretation of the results are
highlighted and suggestions are formulated to circumvent them and draw the
right conclusions. Finally, a post-processing approach focusing on re-usability
and automation is developed.</p>
      <sec id="sec-3-1">
        <title>Benchmark dataset and queries</title>
        <p>2.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>The DISQOVER platform</title>
      <p>Ontoforce has designed a semantic search platform DISQOVER which integrates
over 110 Life Sciences data sources. Examples of these data are PubMed,
ClinicalTrials.gov, NCBI Gene, National Drug Code, MedDRA, DrugBank, MeSH,
etc. DisQover comes with an interactive UI which allows the exploration of this
huge dataset without sacri cing low query latency. The combination of query
federation and interactivity is achieved by combining the use of triple store with
an indexing system in which di erent RDF datasets are combined into well
chosen aggregates by making use of an ETL preprocessing pipeline. To keep up
with the growing set of data sources in their product Ontoforce requires RDF
engines which are (i) fully SPARQL 1.1 compliant, (ii) are su ciently mature
to operate in a production environment and (iii) allow their o ering to further
scale out, thus requiring RDF solutions which support compression or horizontal
scalability.
2.2</p>
    </sec>
    <sec id="sec-5">
      <title>Ontoforce Data Analysis</title>
      <p>
        #instances per class
300 #triples per predicate
200
100
The benchmark query mix consists of 1,223 queries which are both complex and
diverse in term of SPARQL features. The queries are automatically generated
by the UI in the context of faceted browsing [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] making aggregation and lter
operations very common. The actual query formulation is not optimized towards
performance [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>To provide the reader with some insights in the queries we used SPARQL.js
parser3, which converts SPARQL queries into JSON objects. This JSON
structure is then used to generate a feature vector per query. We distinguish between
features related to the complexity of the query structure and features which
correspond to SPARQL keyword counts. In Figure 2, a series of features and their
occurrence distribution is shown which shed further light on the complexity of
the SPARQL patterns.
1. properties of the JSON tree representation such as the number of levels
(depth), the number of nodes (keys) and the length of the le (jsonLines);
2. properties of the query graph structure such as the amount of queries, BGPs
and the total amount of triple patterns;
3. the type of triple patterns.</p>
      <p>The large amount of queries with over 10 triple patterns is noteworthy, while
the Watdiv query templates4 contain 3 up to 10 patterns maximally. The
prevalence of unbound triples reveals how the DISQOVER queries are built: starting
from general queries where additional selectivity is introduced by ad hoc
introduction of FILTER statements. Most of these FILTER ?x = &lt;...&gt; queries can be
manually removed by replacing the corresponding ?x variable in the triple
patterns. Other FILTER operations contain an IN operator followed by a long series
of possible values for a variable, which could be rewritten as complex unions.
Half of the queries are COUNT DISTINCT queries, furthermore most keywords are
present, their e ect on query runtimes will be studied in Fig. 5.
3 https://www.npmjs.com/package/sparqljs
4 http://dsg.uwaterloo.ca/watdiv/basic-testing.shtml
104
103
102
101
100
depth
keys
jsonLines
query
bgp
triplePattern
tp_???
tp_sp? tp_?p? tp_?po
The selected RDF storage solutions should be capable of serving in a production
environment with Life Sciences data. Important criteria here are the maturity of
the solutions and the ability to handle Big Data while o ering full SPARQL 1.1
compliance. The following setups were used in the benchmarks, we also introduce
a shorthand notation for the di erent benchmark runs:
{ Single-node references : Virtuoso 7.2.41 (V1) and Blazegraph 2.0.0 (Bla1)
single-node setups.
{ Vertical Scalability : is analyzed by scaling down the reference Virtuoso node
to a 32GB machine (V1 32).
{ Compression: Jena Fuseki is used as a SPARQL endpoint on top of
compressed HDT les - one per graph (Fu1).
{ Query federation: V1 was used for individual endpoints and a separate
instance ran FluidOps FedX 3.2 with a Virtuoso Adapter. Two setups were
tested: one with a single endpoint to measure the overhead of the federation
software (Fl1) and one with 3 endpoints (Fl3).
{ Horizontal scaling : Virtuoso's enterprise o ering includes support for a sharded
cluster, which we tested in a 3 node setup (V3).</p>
      <p>The hardware for the benchmarks was selected out of the AWS on-demand
instance o er as follows5:
{ r3.2xlarge (8 vCPU, 61 GB RAM) for the triple stores and for FedX.
{ c3.2xlarge (8 vCPU, 15 GB RAM) to run the benchmark client.
{ r3.xlarge (4 vCPU, 30 GB RAM) for the scaled down V 32.</p>
      <p>Multiple independent simulations were run to improve the con dence in our
results for some set-ups, in the notation these are distinguished by using a
simulation id, for example V3 0.
5 https://aws.amazon.com/ec2/instance-types/</p>
      <p>A PAGO (Pay-as-you go) license was used for Virtuoso6. Blazegraph7 was
installed manually with the quad index8 enabled. All stores were con gured to
make use of the available memory and CPU. Fuseki was con gured to keep half
of the HDT graphs in memory, which was the maximum feasible given the system
memory. SPARQL Query Benchmarker9 was con gured to run 1 single-threaded
warmup run and one multithreaded run with 5 concurrent threads. Each thread
runs an operation mix with all queries in a randomized order, the query timeout
was set to 20 minutes.
4</p>
      <sec id="sec-5-1">
        <title>Results</title>
        <p>The dataset and query properties indicate that the DISQOVER dataset will
challenge existing RDF databases. In the results section (i) we investigate the
ability of the di erent solutions to ingest the big dataset and to survive a
multithreaded stress test with complex queries; by diving into the logs (ii) we show
which errors occur and if the queries are solved correctly; and nally (iii) we
analyse the features of the most time consuming queries.
4.1</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Database Survival during Ingest- and Stress-Testing</title>
      <p>A benchmark consists of a data ingest phase and a query execution phase. For
V1 3 concurrent bulk loaders were used which nished after 4h11m. Scaling
down to V1 32 also impacted the bulk loader which now required 11h15m. In
the clustered setup V3, 9 bulk loaders were run and every graph was stored
only on one of the nodes, this process nished in 4h35m. The Flu simulations
used Virtuoso as SPARQL endpoint with the data manually distributed, with
a single node hosting the PubMed dataset (60%) and the other two nodes each
containing approximately 20% of the data. Bla1 took 7d12h38m to complete the
ingestion. For Fu1 the data needs to be compressed to an HDT le per graph. A
high memory machine (256GB RAM) was used to complete this task which was
only possible when converting the N-Quads to Turtle format before running the
HDT in memory algorithm which took 15h23m to complete. The time to build
all index les required by each HDT le, about 1h, must be added to the total
for Fuseki. The benchmarker software only generates runtime information upon
completion of a run. Therefore the benchmarker logs were analysed to nd the
last successful query per query thread for every simulation, the results of which
can be seen in Fig. 3.</p>
      <p>Some complications arose in between the loading and the benchmark phase
with some of the Virtuoso runs. For V1 32 the store went o ine on 3
separate occasions after initiating the benchmark client. In the third occasion we
6 https://aws.amazon.com/marketplace/pp/B011VMCZ8K/ref=srh_res_product_title?ie=UTF8&amp;sr=0-5&amp;
qid=1455494788712
7 https://www.blazegraph.com/product/
8 https://wiki.blazegraph.com/wiki/index.php/Configuring_Blazegraph#Quads_Mode
9 http://sourceforge.net/p/sparql-query-bm/wiki/Introduction/
Bla1
Fu1
Fl3_3
Fl3_2
Fl3_1
Fl1_3
Fl1_2
Fl1_1
V3_2
V3_1
V3_0
V1_32</p>
      <p>V1</p>
      <p>Warmup
0
1000
2000
3000
4000
5000
6000
7000
8000
waited for the transaction logs to be fully replayed before starting the
benchmark. The PAGO instance has a limit of 10 user sessions which must be used for
both querying and bulk loading. For this reason the Virtuoso simulations were
restarted after the ingest phase with the HTTP threads increased to 10 (except
for V3 2, 1 bulkloader). This restart did not immediately succeed.
4.2</p>
    </sec>
    <sec id="sec-7">
      <title>Error Frequency and Types</title>
      <p>The query event stream generated from the benchmarker logs contains
information about the type of errors, the number of results and the query runtime. Upon
comparing the number of results for every query we discovered that this can
be very di erent between simulations. Careful analysis showed that the
benchmarker results le is incorrect in case a query is not always successful. Query
failures { with 0 results { for example modify the average number of results,
while omitting these cases is more intuitive. The query event stream allowed
us to lter out the successful queries per thread and compare the number of
results per query. The latter was always identical between the threads of a single
database but in between systems the results can di er, an important observation
which is often overlooked by focusing exclusively on the runtimes.</p>
      <p>In Fig. 4 we show both the error frequency and the correctness. Correctness
is de ned as having the maximal number of results compared to other
simulations. We used V1 in pairwise comparisons and with the exception of 2 queries
(Flu1) the results are always correct. DISQOVER results can be obtained by
using both a triple store or an indexing system as the backend, the results are
consistent in case of V1 and the alternate system. For the two incorrect results
V1 returned 1,048,576 which seems to be an upper boundary, while V1 was
explicitly con gured not to limit the results. Fu1 returns more results for the
queries targeting a speci c graph but upon closer inspection this reveals an error
in the HDT graph implementation. Incorrectly, these queries query the default
union graph.
4.3</p>
    </sec>
    <sec id="sec-8">
      <title>In-depth Analysis based on Query Features</title>
      <p>In this section we study the properties of the queries with respect to errors and
runtimes. Three important observations can be made:
{ The problematic queries for V1 and V3 are in general queries which are
complicated along all query features, i.e. have a high frequency of the
different SPARQL keywords. The feature values are in general 1 to 2 standard
deviations above the average query.
{ The queries successfully solved by Fu1 and Bla1 are queries which are less
complicated than the average query, they correspond to BGP queries with
hardly any SPARQL operators.
{ A runtime comparison cannot be considered reliable. For example for the
V1 32 simulation 80% of the correct queries are COUNT DISTINCT queries
for which we cannot verify whether the actual counts are correct since they
were not logged by SPARQL query benchmarker. (only the number of results
per query are counted)</p>
      <p>In Fig. 5 we sorted all queries by descending execution time, a second axis
shows the actual runtimes which drop from 15 minutes 10 seconds in the rst
100 queries. The runtime of a single multi-threaded query mix is 4h04m. As
DISTINCT
GROUP
UNION
FILTER (scaled 1/3)
FILTER IN (scaled 1/3)
ORDER
OPTIONAL</p>
      <p>Runtime
mentioned in section 3, this plot gives an idea about the occurrence frequency
of certain SPARQL keywords.</p>
      <p>We took slices of 25 or 50 queries and calculated for each the average feature
vector the values of which are plotted. This immediately shows that the COUNT
DISTINCT queries are the most challenging. The frequency of FILTER operators
is a bad indicator for complexity, V1's query optimizer most likely eliminates
most of these, FILTER IN is a better indicator. The combination of OPTIONAL,
GROUP and ORDER poses the biggest challenge. Note that query features alone
cannot fully predict runtime complexity, the graph structure of the data also
plays an important role.
5</p>
      <sec id="sec-8-1">
        <title>Conclusions and Future Work</title>
        <p>Complex SPARQL queries in combination with a big RDF dataset are a real
challenge for most RDF solutions. In depth analysis of the query results leads to
the recommendation of adding more diagnostics to ascertain proper operation of
RDF stores, an upgrade and extension of the SPARQL benchmarker to better
deal with errors and counts is desirable. Scienti c claims about the runtimes
of the di erent engines are premature but current evaluation does show that
sharded parallelisation is the most promising for truly big RDF.
6</p>
      </sec>
      <sec id="sec-8-2">
        <title>Acknowledgments</title>
        <p>The research activities were funded by VLAIO (the Agency for Innovation and
Entrepreneurship in Flanders) in an R&amp;D project with Ontoforce, Ghent
University and imec (iMinds). iLab.t provided the required high memory infrastructure.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>G.</given-names>
            <surname>Aluc</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hartig</surname>
          </string-name>
          , M. T. Ozsu, and
          <string-name>
            <given-names>K.</given-names>
            <surname>Daudjee</surname>
          </string-name>
          .
          <article-title>Diversi ed stress testing of RDF data management systems</article-title>
          .
          <source>In The Semantic Web{ISWC</source>
          <year>2014</year>
          , pages
          <fpage>197</fpage>
          {
          <fpage>212</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Boncz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Larriba-Pey</surname>
          </string-name>
          , I. Fundulaki,
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Erling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Neubauer</surname>
          </string-name>
          , et al.
          <article-title>The linked data benchmark council: A graph and rdf industry benchmarking e ort</article-title>
          .
          <source>SIGMOD Rec</source>
          .,
          <volume>43</volume>
          (
          <issue>1</issue>
          ):
          <volume>27</volume>
          {
          <fpage>31</fpage>
          , May
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Schultz</surname>
          </string-name>
          . The Berlin SPARQL Benchmark.
          <source>International Journal on Semantic Web and Information Systems (IJSWIS)</source>
          ,
          <volume>5</volume>
          (
          <issue>2</issue>
          ):1{
          <fpage>24</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>P.</given-names>
            <surname>Cudre-Mauroux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Enchev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Fundatureanu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Haque</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Harth</surname>
          </string-name>
          ,
          <string-name>
            <surname>Keppmann</surname>
          </string-name>
          , et al.
          <article-title>NoSQL databases for RDF: an empirical evaluation</article-title>
          .
          <source>In The Semantic Web{ISWC</source>
          <year>2013</year>
          , pages
          <fpage>310</fpage>
          {
          <fpage>325</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>O.</given-names>
            <surname>Cure</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Naacke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Baazizi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>B.</given-names>
            <surname>Amann</surname>
          </string-name>
          .
          <article-title>On the evaluation of rdf distribution algorithms implemented over apache spark</article-title>
          .
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>D. De Witte</surname>
            , L. De Vocht,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Mannens</surname>
          </string-name>
          , et al.
          <article-title>Big linked data etl benchmark on cloud commodity hardware</article-title>
          .
          <source>In Proceedings of the International Workshop on Semantic Big Data, page 12. ACM</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Mart</surname>
          </string-name>
          nez-Prieto,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gutierrez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Polleres</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Arias</surname>
          </string-name>
          .
          <article-title>Binary RDF representation for publication and exchange (HDT)</article-title>
          .
          <source>Journal of Web Semantics</source>
          ,
          <volume>19</volume>
          :
          <fpage>22</fpage>
          {
          <fpage>41</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>S.</given-names>
            <surname>Ferre</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Hermann, and
          <string-name>
            <given-names>M.</given-names>
            <surname>Ducasse</surname>
          </string-name>
          .
          <source>Semantic Faceted Search: Safe and Expressive Navigation in RDF Graphs. Research Report PI</source>
          <year>1964</year>
          , Jan.
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Guo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Pan</surname>
          </string-name>
          , and
          <string-name>
            <surname>J.</surname>
          </string-name>
          <article-title>He in. LUBM: A benchmark for OWL knowledge base systems</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <volume>158</volume>
          {
          <fpage>182</fpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>A.</given-names>
            <surname>Loizou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Angles</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Groth</surname>
          </string-name>
          .
          <article-title>On the formulation of performant fSPARQLg queries</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          ,
          <volume>31</volume>
          :1 {
          <fpage>26</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>M. Morsey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Auer</surname>
          </string-name>
          , and A.
          <string-name>
            <surname>-C. Ngonga</surname>
          </string-name>
          <article-title>Ngomo. DBpedia SPARQL benchmark{performance assessment with real queries on real data</article-title>
          .
          <source>The Semantic Web{ISWC</source>
          <year>2011</year>
          , pages
          <fpage>454</fpage>
          {
          <fpage>469</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>A</surname>
          </string-name>
          .
          <string-name>
            <surname>-C. N. Ngomo</surname>
            and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Ro</surname>
          </string-name>
          <article-title>der</article-title>
          . Hobbit:
          <article-title>Holistic benchmarking for big linked data</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13. A. Schatzle, M.
          <string-name>
            <surname>Przyjaciel-Zablocki</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Neu</surname>
            , and
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Lausen</surname>
          </string-name>
          .
          <source>Sempala: Interactive SPARQL Query Processing on Hadoop. In The Semantic Web - ISWC</source>
          <year>2014</year>
          - 13th International Semantic Web Conference, Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>164</volume>
          {
          <fpage>179</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>M. Schmidt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Hornung</surname>
            , G. Lausen, and
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Pinkel</surname>
          </string-name>
          .
          <article-title>SP^ 2Bench: a SPARQL performance benchmark</article-title>
          .
          <source>In Data Engineering</source>
          ,
          <year>2009</year>
          . ICDE'09. IEEE 25th International Conference on, pages
          <volume>222</volume>
          {
          <fpage>233</fpage>
          . IEEE,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>A.</given-names>
            <surname>Schwarte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Haase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hose</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schenkel</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          .
          <source>FedX: Optimization Techniques for Federated Query Processing on Linked Data. In The Semantic Web - ISWC</source>
          <year>2011</year>
          , Proceedings,
          <string-name>
            <surname>Part</surname>
            <given-names>I</given-names>
          </string-name>
          , pages
          <volume>601</volume>
          {
          <fpage>616</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>H. Wu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Fujiwara</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Bolleman</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Yamaguchi</surname>
          </string-name>
          .
          <source>BioBenchmark Toyama</source>
          <year>2012</year>
          :
          <article-title>an evaluation of the performance of triple stores on biological data</article-title>
          .
          <source>Journal of biomedical semantics</source>
          ,
          <volume>5</volume>
          (
          <issue>1</issue>
          ):
          <fpage>1</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>