<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Benchmarking RDF Query Engines: The LDBC Semantic Publishing Benchmark</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>V. Kotsev</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>N. Minadakis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>V. Papakonstantinou</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>O. Erling</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>I. Fundulaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>A. Kiryakov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science-FORTH</institution>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ontotext</institution>
          ,
          <country country="BG">Bulgaria</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>OpenLink Software</institution>
          ,
          <country country="NL">Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The Linked Data paradigm which is now the prominent enabler for sharing huge volumes of data by means of Semantic Web technologies, has created novel challenges for non-relational data management technologies such as RDF and graph database systems. Benchmarking, which is an important factor in the development of research on RDF and graph data management technologies, must address these challenges. In this paper we present the Semantic Publishing Benchmark (SPB ) developed in the context of the Linked Data Benchmark Council (LDBC) EU project. It is based on the scenario of the BBC media organisation which makes heavy use of Linked Data Technologies such as RDF and SPARQL. In SPB a large number of aggregation agents provide the heavy query workload, while at the same time a steady stream of editorial agents execute a number of update operations. In this paper we describe the benchmark's schema, data generator, workload and report the results of experiments conducted using SPB for the Virtuoso and GraphDB RDF engines.</p>
      </abstract>
      <kwd-group>
        <kwd>RDF</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Benchmarking</kwd>
        <kwd>Graph Databases</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Non-relational data management is emerging as a critical need in the era of a new
data economy where heterogeneous, schema-less, and complexly structured data
from a number of domains are published in RDF. In this new environment where
the Linked Data paradigm is now the prominent enabler for sharing huge volumes
of data, several data management challenges are present and which RDF and
graph database technologies are called to tackle. In this context, benchmarking
is an important factor in the development of research of the aforementioned
technologies that are called to address these new challenges.</p>
      <p>
        To test the performance of SPARQL query evaluation techniques, a
number of benchmarks have been proposed in the literature [
        <xref ref-type="bibr" rid="ref1 ref10 ref11 ref12 ref13 ref15 ref18 ref6 ref8 ref9">1, 6, 8, 9, 11–13, 15, 10,
18</xref>
        ]. Benchmarks can be used to inform users of the strengths and weaknesses
of competing software products, but more importantly, they encourage the
advancement of technology by providing both academia and industry with clear
targets for improving the performance and functionality of the systems.
      </p>
      <p>
        Existing RDF benchmarks do not fully cover important RDF and SPARQL
intricacies since data remains relational at heart, the workload is generally
formed by simple read-only query operations, and the queries have not been
specifically designed to stress the systems and in particular their debilities. In
addition, existing benchmarks do not consider real use case scenarios (except
DBSB [
        <xref ref-type="bibr" rid="ref13 ref6">6, 13</xref>
        ]) and workloads. A number of benchmarks that actually use real
ontologies and datasets employ synthetic workloads to test the performance of
RDF systems [
        <xref ref-type="bibr" rid="ref10 ref18">10, 18</xref>
        ] that do not necessarily reflect real world usage scenarios [
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref15 ref9">1,
9, 11, 12, 15</xref>
        ].
      </p>
      <p>In this paper we present the Semantic Publishing Benchmark (SPB )
developed in the context of the Linked Data Benchmark Council (LDBC)4 European
project. It is inspired by the Media/Publishing industry, and in particular by
BBC’s5 “Dynamic Semantic Publishing” (DSP) concept. BBC maintains sets of
RDF descriptions of its catalogue of creative works such as articles, photos, and
videos.</p>
      <p>
        SPB is designed to reflect a scenario where a large number of aggregation
agents provide the heavy query workload, while at the same time a steady stream
of editorial agents implement update operations that insert and delete creative
works. Journalists use the aggregation agents to query existing creative works
and use the retrieved data in order to create new ones using the editorial agents.
The Semantic Publishing Benchmark includes:
– a data generator that uses ontologies and reference datasets provided by
BBC, to produce sets of creative works. The data generator supports the
creation of arbitrarily large RDF datasets in the order of billions of triples
that mimic the characteristics of the reference BBC datasets.
– the workload in SPB is defined by the simultaneous execution of editorial
and aggregation agents, simulating a constant load generated by end-users,
journalists, editors or automated engines. The workload is designed to reflect
a scenario where a large number of aggregation agents provide the heavy
query workload, while at the same time a steady stream of editorial agents
implement update operations that insert and delete creative works.
The SPB queries are defined in a way that tackle the choke points [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that
each RDF store needs to address in order to satisfy the requirements raised
from real world use cases.
– performance metrics that describe how fast an RDF database can execute
queries by simultaneously running aggregation and editorial agents.
      </p>
      <p>
        The paper is structured as follows: Section 2 discusses briefly SPB , Section 3
discusses the experiments we conducted running SPB for the Virtuoso [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
and GraphDB [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] RDF engines. Related work is presented in Section 4 and
conclusions in Section 5.
      </p>
    </sec>
    <sec id="sec-2">
      <title>4http://ldbcouncil.org 5British Broadcasting Corporation: http://www.bbc.com/</title>
      <sec id="sec-2-1">
        <title>Semantic Publishing Benchmark (SPB )</title>
        <p>The scenario of SPB is based around a media organization that maintains RDF
descriptions of its catalogue of creative works or journalistic assets (articles,
photos, videos), papers, books, movies among others. Creative works are valid
instances of BBC ontologies that define numerous concepts and properties
employed to describe this content.</p>
        <p>In this section we discuss the ontologies and reference datasets (Section 2.1)
used by SPB , its data generator (Section 2.2) that uses the ontologies and
reference datasets provided by BBC to produce a set of creative works, the queries
that the benchmark introduces (Section 2.3) and finally the employed
performance metrics (Section 2.4).
2.1</p>
        <sec id="sec-2-1-1">
          <title>Ontologies &amp; Reference Datasets</title>
          <p>SPB uses seven core and three domain RDF ontologies provided by BBC. The
former define the main entities and their properties, required to describe
essential concepts of the benchmark namely, creative works, persons, documents,
BBC products (news, music, sport, education, blogs), annotations (tags),
provenance of resources and content management system information. The latter are
used to express concepts from a domain of interest such as football, politics,
entertainment among others.</p>
          <p>
            The ontologies are relatively simple: they contain few classes (74), 29 and
88 data type and object properties respectively, and shallow class and property
hierarchies (65 rdfs:subClassOf, 17 rdfs:subPropertyOf). More specifically,
the class hierarchy has a maximum depth of 3 whereas the property
hierarchy has a depth of 1. They also contain restrictions (107 rdfs:domain and 117
rdfs:range RDFS properties). The BBC ontologies also ontain 8 owl:oneOf class
axioms that allow one to define a class by enumeration of its instances and 2
owl:TransitiveProperty properties. Additionally the simplest possible flavor
of OWL has been used (owl:TransitiveProperty, owl:sameAs) for nesting of
geographic locations or relations between entities in the reference dataset. A
detailed presentation of the ontologies employed by SPB can be found in [
            <xref ref-type="bibr" rid="ref4">4</xref>
            ].
          </p>
          <p>SPB also uses reference datasets that are employed by the data generator to
produce the data of interest. These datasets are snapshots of the real datasets
provided by BBC; in addition, a GeoNames and DBPedia reference dataset has
been included for further enriching the annotations with geo-locations to enable
the formulation of geo-spatial queries, and person data.
2.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Data Generation</title>
          <p>The SPB data generator produces RDF descriptions of creative works that are
valid instances of the BBC ontologies presented previously. A creative work is
described by a number of data value and object value properties; a creative work
also has properties that link it to resources defined in reference datasets: those
are the about and mentions properties, and their values can be any resource.
One of the purposes of the data generator is to produce synthetic large (in the
order of billions of triples) datasets in order to check the ability of the engines
to scale. The generator models three types of relations in the data:
Clustering of data The clustering effect is produced by generating creative
works about a single entity from reference datasets and for a fixed period of time.
The number of creative works starts with a high peak at the beginning of the
clustering period and follows a smooth decay towards its end. The data generator
produces major and minor clusterings with sizes (i.e., number of creative works)
of different magnitude. By default five major and one hundred minor clusterings
of data are produced for one year period. Example of clusterings of data could
be news items that are about events starting with a high number of journalistic
assets related to them and following a decay in time as they reach the end of
time period, a tendency that mirrors a real world scenario in which a ’fresh’
event is popular and its popularity decreases as time goes by.</p>
          <p>Correlations of entities This correlation effect is produced by generating
creative works about two or three entities from reference data in a fixed period of
time. Each of the entities is tagged by creative works solely at the beginning and
end of the correlation period and in the middle of it, both are used as tags for the
same creative work. By default fifty correlations between entities are modelled
for one year period. Such an example of data correlation could be that several
’popular’ persons (e.g., Stross-Cahn and Barroso) are mentioned together by
creative works for a certain period of time.</p>
          <p>Random tagging of entities Random data distributions are defined with
a bias towards popular entities created when the tagging is performed, that is
when values are assigned to about and mentions creative work properties. This is
achieved by randomly selecting a 5% of all the resources from reference data and
mark them as popular when the remaining ones are marked as regular. When
creating creative works, 30% percent of them are tagged with randomly selected
popular resources and the remaining 70% are linked to the regular ones. Example
for random taggings could be every-day events which become less important
several days after their start date. Distributions of about and mentions tags
analysed from a ’live’ dataset provided by the BBC are reproduced in generated
data. Table 1 shows the distribution of total about and mentions tags found in
creative works and also shows their individual distributions.</p>
          <p>Additional modification of current version has been added which triples the
number of about and mentions tags used in Creative Works, thus providing a
better interconnectedness of entities across whole dataset. This random
generation of data concerns only one third of all generated data; the remaining data
is generated with correlations and clustering effects modeled as previously
described.</p>
          <p>The SPB data generator operates in a sequence of phases:
1. ontologies and reference datasets are loaded in an RDF repository;
2. all instances of the domain ontologies that exist in the reference datasets
are retrieved by means of predefined SPARQL queries that will be used as
values for the about and mentions properties of creative works;
Amount
3. from the previous set of instances, the popular and regular entities are
selected;
4. the generator produces the creative works according to the three properties
discussed previously.
SPB is designed to reflect a scenario where a large number of aggregation agents
provide the heavy query workload requesting creative works, while at the same
time a steady stream of editorial agents implement update operations that insert
and delete creative works.</p>
          <p>
            Aggregation Queries SPB queries are valuable in the sense that they stress
important technical functionalities that systems must tackle. P. Boncz [
            <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
            ] uses
the terms choke points to refer to difficulties in the workloads that address
elements of particular technical functionality and can be used for designing useful
benchmarks. These choke points arise from data distributions, the queries and
updates, and the workloads that implement the latter two. Examples of such
choke points are aggregation and join performance, data access locality,
expression calculation, parallelism, concurrency and correlated subqueries.
          </p>
          <p>SPB queries are defined in a way that tackle the choke points that each
RDF store needs to address in order to satisfy the requirements raised from
real world use cases. The benchmark comes with two types of workloads: the
base and advanced versions. The former tackles challenges related to SPARQL
query processing, whereas the latter extends the former by including versioning
and backup database functionalities. In this paper we restrict ourselves to the
presentation of the base version of SPB benchmark since it addresses features
interesting for SPARQL query processing. A subset of SPB queries are based on
real ones obtained from the applications that BBC journalists use to access
existing creative works in order to create new ones; the remaining queries are based
on these complex queries and are enhanced with different features of SPARQL
1.1.</p>
          <p>We first present the choke points (CP) SPB queries implement and then
we provide a detailed explanation of the SPB queries and the choke points they
implement.
cp1: join ordering This choke point tests if the optimizer can evaluate
the trade-offs between the time spent to find the best execution plan and the
quality of the output plan. It also tests the ability of the engine to consider
cardinality constraints expressed by the different kinds of properties defined in
the ontologies (such as functional properties and inverse functional properties).
cp2: aggregation Aggregations are implemented with the use of sub-selects
in the SPARQL query; the optimizer should recognize the operations included
in the sub-selects and evaluate them first.
cp3: optional and nested optional clauses This choke point tests
the ability of the optimizer to produce a plan where the execution of the optional
triple patterns is the last to be performed since optional clauses do not reduce
the size of intermediate results.
cp4: reasoning Reasoning tests are used to check whether the systems
handle efficiently RDFS and OWL constructs.
cp5: parallel execution of unions This choke point tests the ability
of the optimizer to produce plans where unions are executed in parallel. This is
especially helpful if the involved subqueries produce a large number of results.
cp6: optionals with filters This specific choke point tests the ability of
the engines to execute as early as possible such expressions in order to eliminate
a possibly large number of intermediate results. Note that this choke point is
similar to cp3 with the difference that here the expression contains filters that
bind variables to specific values. Filters reduce (in some cases significantly) the
size of the intermediate results. Such queries (related to CP3 and CP6) are of
high importance as they often found in real world scenarios.
cp7: ordering This test checks the ability of the optimizer to choose query
plan(s) that facilitate the ordering of results.
cp8: geo-spatial predicates This choke point tests the ability of the
system to handle queries for geospatial data; queries that mention entities within
a special geospatial range address this technical challenge.
cp9: full text search Queries that involve the evaluation of regular
expressions on data value properties of resources address this choke point.
cp10: duplicate elimination This choke point tests the ability of the
system to identify duplicate entries and eliminate them during the creation of
intermediate results.
cp11: complex filter conditions Filter conditions that involve
negation, conjunction and disjunction can be used to test if the optimizer is able to
split the filter conditions into conjunctions of conditions and execute them in
parallel and as soon as possible.</p>
          <p>In Table 2 we provide the 12 queries that comprise the base version of SPB .
For each of them we provide a text description and we enumerate the choke
points it addresses. All the SPARQL queries can be found online6. Queries Q1,
Q2, Q3, Q4, Q8 use the CONSTRUCT, and Q5, Q6, Q7, Q9, Q10, Q11 and Q12
use the SELECT SPARQL modifiers.</p>
          <p>Query Description
Choke
Points
Retrieve the 10 most recent creative works, that are about or mention cp1,
different topics. For each creative work a graph is returned that com- cp2,
prises of the work’s title, shortTitle, dateCreated, dateModified, descrip- cp3,
tion, primaryFormat and primaryContentOf properties; if the creative cp4,
work has a thumbnail, then return its thumbnailAltText and thumb- cp5
nailType. Also retrieve existing properties of the topics that the creative
work is about or mentions: aboutLabel, aboutShortLabel,
aboutPreferredLabel, mentionsLabel, mentionsShortLabel and mentionsPreferredLabel.
Retrieve details about a given resource that is an instance of class cp1,
Creative Work and its subclasses, namely its title, dateCreated, date- cp3,
Modified, the topic the resource is about, its primaryContentOf and the cp4
webDocumentType thereof.</p>
          <p>Retrieve a list of creative works that are instances of classes BlogPost or cp3,
NewsItem, that are about a specific topic, the value of primaryFormat cp5,
is one of TextualFormat, InteractiveFormat or PictureGalleryFormat. cp6,
If the creative work has an audience then the creative work should be cp10,
obtained using this value. Given the retrieved list of creative works, cp11
return for each resource its related nodes, and then order the result in
descending order of the value of their creationDate property.</p>
          <p>Return a list of all creative works with all their properties, that are cp1,
about a given topic, have a given primaryFormat, and are instances of cp7
a certain subclass of class CreativeWork. The results should be ordered
by property creationDate and only N results should be returned.</p>
          <p>Return the list of most popular topics that creative works of a given cp1,
type are about, have a given audience, and they have been created in a cp2,
specific time range (dateModified is between a start and an end date). cp3,
For each retrieved topic get its properties (if they exist) cannonical- cp7,
Name and preferredLabel; and if any of the labels exist, return it as a cp11
result along with the count of topics with this label. The result should
be returned on descending order of the count of labels.</p>
          <p>Retrieve all instances of class CreativeWork that mention a geo-location cp7,
that is within the boundaries of a given rectangle area. Along with cp8,
each retrieved creative work retrieve property geonamesId and its lat cp10,
(latitude) and long (longitude) values. Limit the result to 100 creative cp11
works (geo-spatial query).</p>
          <p>Retrieve properties dateModif, title, category, liveCoverage, audience for cp1,
all creative works that are of a given type. The value of property date- cp7,
Modif of the retrieved creative works should be within a certain time- cp11
range. Return 100 results ordered in ascending order by their dateModif.
Q1
Q2
Q3
Q4
Q5
Q6
Q7
6https://github.com/ldbc/ldbc_spb_bm_2.0/tree/master/datasets_and_queries
/sparql/basic/aggregation_standard</p>
          <p>Retrieve the graphs of resources that are instances of class Creative- cp3,
Work considering also its subclasses that comprise the resources’ type, cp4,
title, description, dateCreated, dateModified, category, the topic they cp9,
are about, their primaryContentOf and the latter’s webDocumentType. cp11
Each of the returned resources should contain in its title or description
a given string. (full text query)
Retrieve 10 similar creative works, by calculating their similarity score. cp2,
The creative works should be ordered by the computed score and in cp7,
descending order of the value of property dateModified. cp10
Retrieve creative works that mention locations in the same province cp1,
(A.ADM1) as the specified one. Additional constraint on time interval cp4,
further limits returned results. cp5,
cp11
Retrieve a list of the most recent Creative Works that have tagged cp1,
with entities, related to a specific popular entity from reference dataset. cp4,
Relations can be (inbound and outbound; explicit or inferred) cp5,
cp7
Retrieve the descriptions of the latest creative works tagged with a cp1,
specific location. Consider that the description of each specific Cre- cp2,
ativeWork is stored in dedicated named graph. The result should in- cp7,
clude only the explicit statements about the creative work, without cp10
owl:sameAs equivalence and without statements inferred otherwise .</p>
          <p>Table 3 shows the SPARQL features that each of the queries in the SPB query
mix implement. By complex filters we refer to filters that contain conjunction
and disjunction of predicates, and by negation we refer to the use of negation
associated with the bound SPARQL operator employed in the filters. Query
Q1 is a very complex query that contains 11 optionals with 4 of them having
nested optional clauses. Query Q3 contains the majority of the SPARQL features
(except aggregates, group by, and regular expressions in filters).</p>
          <p>Aggregation agents simulate the retrieval operations performed by
journalists, end-users or automated search engines by executing a mix of aggregation
queries of type: aggregation, search, statistical, full-text, geo-spatial,
analytical, drill-down and faceted search. Each aggregation agent will execute a mix
of those query types in a constant loop, until the benchmark run finishes. Each
agent executes a query and waits for response (or a time-out), after receiving
the response next query is executed (queries executed by agents are not of the
same type). Query order of execution is pseudo-randomly chosen following an
even distribution for each query defined in the benchmark’s configuration.
Update Queries Editorial agents simulate the editorial work performed by
journalists, editors or automated text annotation engines by executing the
following update queries:
Feature Q1
optionals 7
nested optionals 4
union
order by 1
distinct 1
limit 1
nested queries 1
group by
aggregates
negation
complex filters
regexp</p>
          <p>Q2 Q3
4 1
-
- 3
- 1
- 1
- 1
- 1
-
-
- 1
- 2
-
– Insert operations generate new creative work descriptions (content
metadata) following the models and distribution rules defined in Section 2.2. Each
creative work is added to the database in a single transaction by execution
of an insert SPARQL query.
– Update operations update an existing creative work. Update operation
consists of two actions, executed in one transaction, following the BBC’s
use-case for update of a creative work. First action is to delete the context
where a creative work description resides along with all its content. Second is
to insert the same creative work (using its current ID) with all its properties
- current and updated ones.
– Delete operations delete an existing creative work. Delete operation will
erase the context where a creative work resides along with all of its content.</p>
          <p>Each editorial agent will execute a mix of editorial operations in a constant
loop, until the benchmark run has finished. Editorial operations executed by an
agent are chosen pseudo-randomly following the distribution: 80% INSERT
operations, 10% UPDATE operations, 10% DELETE operations. Such distribution
follow a similar to live datasets pattern where a massive amount of new data
being added to the database (inserts and updates) and a minor amount of data
being deleted from it. Such rates are open and freely configurable, thus each
audited run should include a statement about the exact distribution values.
2.4</p>
        </sec>
        <sec id="sec-2-1-3">
          <title>Performance Metrics</title>
          <p>Performance metrics produced by the SPB benchmark describe how fast an RDF
database can execute queries (by simultaneously running aggregation agents)
while at the same time executing editorial operations (by simultaneously running
editorial agents) and operating over an amount of generated data in the RDF
database. The data generator of SPB is capable of producing data in different
sizes (up to billions of triples), thus allowing to have performance results for
various scales. SPB outputs two types of performance metrics:
1. Minimum, Maximum and Average execution times for each individual query
and editorial operation during the whole benchmark run.
2. Average execution rate per second for all queries and editorial operations.</p>
          <p>Performance metrics are produced per-second during the benchmark run and
are saved to log files. Further all of the information related to query execution
and query results is saved in log files with different level of details. Regarding
the performance metric (1), due to space restriction, we only present in the
experimental evaluation the average execution times for the SPB workload, but
all the results are available online7.
2.5</p>
        </sec>
        <sec id="sec-2-1-4">
          <title>Running the benchmark</title>
          <p>SPB comes with test and benchmark drivers both distributed as a jar file along
with the BBC ontologies and reference datasets, the queries and updates
discussed earlier and the configuration parameters for running the benchmark and
for generating the data. The data generator uses configuration files that must be
edited appropriately to set the values regarding the dataset size to produce, the
number of aggregation and editorial agents, the query timeout etc. The
distributions used by the data generator to produce the datasets could also be edited.
The benchmark, that is available on GitHub8, is very simple to run (once the
RDF repository used to store the ontologies and the reference datasets is setup,
and the configuration files updated appropriately) from command line.</p>
          <p>The test driver generates substitution parameters for the SPB queries during
data generation. The use of these parameters ensures that the benchmark is
deterministic, i.e., different runs of the benchmark will employ the same values,
and hence produce comparable results. During one benchmark run the driver
goes through an initialization phase where one instance per benchmark query is
created by selecting parameters from the respective query parameter file. Once
each query executes the last set of parameters, the next time it is executed (and
for the same run), it reiterates from the first set of parameters in the list.</p>
          <p>The benchmark driver allows one to specify the size of the sets of substitution
parameters per query, which are saved in files along with the generated data. The
benchmark produces three kinds of files that contain (a) brief information about
each executed query, the size of the returned result, and the execution time (b)
the detailed log of each executed query and its result and (c) the benchmarking
results.
3</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>Evaluation</title>
        <p>
          In this section we report the results we obtained when running SPB for the RDF
engines Virtuoso [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] and GraphDB [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. We used the Virtuoso Opensource
Version 7.50.3213. All the experiments that we conducted for Virtuoso, run in
a server with 2 Intelr Xeonr CPU E5-2630 at 2.30GHz (a total of 12 cores/24
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>7http://ldbcouncil.org/benchmarks/spb 8https://github.com/ldbc/ldbc_spb_bm_2.0</title>
      <p>threads) with 192GB of RAM and running CentOS release 6.2 x86_64. The
system has 6 2TB SATA-2 and 2 512GB SATA-3 disks under Patsburg 6-Port
SATA AHCI Controller.</p>
      <p>We used GraphDB, a semantic graph database management system, Version
6.2 Enterprise Edition. The experiments conducted for GraphDB, run on a
single Intelr Xeonr CPU E5-1650 v3 at 3.5GHz (a total of 6 cores/12 threads)
with 64GB of main memory and running Ubuntu 14.04 x86_64. The system
has 1 500GB SATA-3 in 7200RPM for the operating system only and 2 400GB
SSDs, Samsung 845DC PRO.</p>
      <p>Our aim is not to make a detailed comparison of the performance of the
different systems but provide an idea of how they perform regarding the SPB
benchmark and this is why we did not use the same configuration for Virtuoso
and GraphDB. A complete description of the experiments reported here can
also be found online (see Section 2.5).</p>
      <p>Datasets: For our experiments we produced three datasets of different scales
that correspond to 64M (SF1), 256M (SF3) and 1B (SF5) triples. GraphDB
has decided to publish results for SF1 (64M) and SF3 (256M) only based on the
most common use-cases in the industry where commodity hardware is widely
adopted. Table 4 shows the statistics of the datasets produced by the SPB data
generator that we will use for our experiments. We provide the number of
explicit (i.e., generated) and total triples for each dataset (implicit and explicit), as
well as the number of creative works since these are the resources requested and
updated by the SPB workload. GraphDB supports forward chaining and
computes the closure of the dataset during data generation, consequently we report
the implicit triples. The slight deviation in numbers comes from the configuration
of the data generator e.g. how many threads have been configured to generate
the creative works. Note that the data generator produces creative works, each
resource associated with 25 to 29 triples. So threads will stop producing creative
works once the dataset size as specified in configuration file has been reached.
Nevertheless, if some thread is in the middle of generating a creative work, it
will generate that last resource beforing ending.</p>
      <p>#explicit #implicit</p>
      <p>SF1 63.709.210 N/A
Virtuoso SF3 255.659.835 N/A</p>
      <p>SF5 999.609.164 N/A</p>
      <p>#total #cworks
63.709.210 1.489.675
255.659.835 8.821.474
999.609.164 37.233.029</p>
      <p>GraphDB SSFF13 26546..000000..000000 26296..121420..833461 148332..121420..833461 18..488291..730910
Configuration: For Virtuoso we measured the performance results on
an enterprise grade hardware and SPB driver configuration of 22r/2w threads,
while GraphDB has chosen to use a modest system configuration and an SPB
driver configuration of 8r/2w threads (see Table 5 for the configuration details).
Both vendors were free to choose their optimal configuration of the reading and
writing threads (aggregation and editorial agents) of the SPB driver as well as
to select the system configuration which will be used for running the benchmark
in order to achieve the best performance results.</p>
      <p>Virtuoso GraphDB
Scale Factor SF1 SF3 SF5 SF1 SF3
Aggregation agents 22 22 22 8 8
Editorial agents 2 2 2 2 2
Data generator workers 4 4 4 8 8
Warm up period (min) 10 10 5 10 10
Duration (min) 20 30 20 20 30</p>
      <p>Timout (min) 5 5 5 5 5
Workload: In our experiments we used the Basic Version of SPB that
consists of the 12 queries discussed in Section 2.3. Recall that SPB queries are
expressed in SPARQL 1.1. In the case of Virtuoso that employs backward
reasoning and for queries that involved reasoning we used the option transitive
and enabled rdfs_rule_set option that implements the traversal of RDFS
hierarchies along the rdfs:subClassOf and rdfs:subPropertyOf relations.
Data Generation and Loading Times: The time needed to load the
ontologies and reference datasets in both systems is in the order of few
milliseconds and we do not report it here. The dataset generation and loading times, are
shown in Table 6. Data generation is slower on Virtuoso’s hardware because
Virtuoso’s server uses HDD drive while hardware configuration for GraphDB
uses SSD drive and the data generation process is intensive in terms of I/O
operations to the storage device Regarding the loading times, GraphDB loads data
much slowly (in the case of SF3 the difference is up to one order of magnitude)
because of the forward chaining reasoner which materializes each inferred fact
when adding new ones to the database, something not
done by Virtuoso.</p>
      <p>Performance Metrics: Table 7 shows the average execution rates per
second and Table 8 provides the detailed execution times for each of the 12
benchmark queries. These results are provided for both engines and for all of the
scale factors (SF1, SF3 and SF5). For SF1, Virtuoso executes 150 queries per
second on 22 aggregation agents (reading threads), whereas GraphDB achieves
about 100 queries per second using 8 reading threads.</p>
      <p>A large difference in the update rate exists for the two engines. Recall that,
as described in Section 2.3, the editorial workload consists of 80% of inserting</p>
      <p>SF1
Virtuoso SF3</p>
      <p>SF5
GraphDB SSFF13
GraphDB SSFF13
new facts, 10% of updating and 10% of deleting of existing ones. GraphDB’s
low update rate is due to the fact that it uses a forward-chaining materialization
of inferred statements which adds an overhead when inserting new data.</p>
      <p>Virtuoso uses backward-chaining to compute the necessary implicit triples
during query execution; this is performed in memory and inserting a new fact in
the database does not trigger any materialization of additional knowledge.
Consequently, the update rate is higher. This approach saves time during insertion
of new data, but could be tricky during query execution: all queries that require
materialization Q1-Q8 and Q11 are slower in Virtuoso than for GraphDB, as
such materialization took place during query execution.</p>
      <p>Further increasing the scale factor of generated data from SF1 to SF3, which
effectively consists of 239M generated statements and 25M statements of
reference data, shows a drop in query performance for both GraphDB (60%) and
Virtuoso (45%) which is expected because of the four times larger amount of
data that the SPB benchmark operates on.</p>
      <p>
        In the reported experiments, Virtuoso stores RDF triples in a single RDF
quad table with Virtuoso’s default index configuration and considers hash join
as a possible join type. Considering a hash join as a possibility always slows down
compilation, and improves some queries and slows others down, not greatly
affecting the average throughput. We run additional (unaudited) experiments with
Virtuoso where we introduced query plan caching that raises the Virtuoso
score from 80 queries/per second (qps) to 144 qps for SF3. The rationale for
considering hash join in the first place is that analytical workloads such as SPB ’s
heavily rely on this. A good TPC-H score is simply unfeasible without this9 and
hash join is indispensible if RDF is to be a serious contender beyond serving
lookups. The decision for using this however depends on accurate cardinality
estimates on either side of the join. Previous work [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] advocates altogether doing
away with a cost model that is unreliable and hard to find for RDF datasets.
The present Virtuoso approach is that going to rule based optimization is not
the preferred solution, but rather using characteristic sets [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] for reducing triples
into wider tables, which also cuts down on plan search space and increases
reliability of cost estimation. When looking at execution alone, we see that actual
database operations are low in the profile, with memory management taking the
top 19%. This is due to construct queries allocating small blocks for returning
graphs, which is entirely avoidable.
4
      </p>
      <sec id="sec-3-1">
        <title>State of the art</title>
        <p>In this Section we are going to provide a short overview of the existing RDF
benchmarks, and an analysis based on the following dimensions: the datasets
(including data generators in the case of synthetic benchmarks) and query
workloads. In our presentation we distinguish between benchmarks that use real
datasets and those that produce synthetic datasets using special purpose data
generators.</p>
        <p>
          DBPedia10, UniProt KnowledgeBase (UniProtKB) [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] and YAGO [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] are
the most well known and widely used real datasets for benchmarking RDF
engines. DBPedia extracts structured information from Wikipedia to create a large
dataset for benchmarking RDF engines. Although the DBPedia dataset is one
of the reference datasets for the Linked Open Data Cloud, there is no clear
set of queries for it. The query workload named DBPSB (DBPedia SPARQL
Benchmark) was proposed by the University of Leipzig [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] and was derived from
the DBPedia query logs; it consists of mostly simple lookup queries and does not
consider more complex features such as inference. UniProt [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], a high-quality
dataset describing protein sequences and related functional information is
expressed in RDF. It is one of the central datasets in the bio-medical part of the
Linked Open Data Cloud and uses an OWL ontology expressed in a sub-language
9http://www.openlinksw.com/dataspace/doc/oerling/weblog/Orri%20Erling%27s
%20Blog/1856
10DBPedia: http://dbpedia.org/sparql
of OWL-Lite. The queries are mainly lookup queries [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] but some are also used
to test the reasoning capabilities of RDF databases (e.g., taxonomic queries).
Finally, YAGO [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is another knowledge base that integrates statements from
Wikipedia, Wordnet, WordNet Domains, Universal WordNet and GeoNames
ontologies. Similar to Uniprot and DBPedia, the YAGO dataset is not accompanied
by a set of queries. Neumann et. al. provided eight mostly lookup and join queries
for an earlier version of the YAGO ontology, for benchmarking the RDF-3X
engine [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ].
        </p>
        <p>SPB uses real reference datasets from the BBC to produce synthetic datasets of
arbitrary size (up to billions of triples) that mimic the characteristics of the
former. The workload comprises of both real and synthetic queries and updates that
consider all SPARQL 1.0 operators that are more complex than the workloads
proposed in the previously discussed benchmarks.</p>
        <p>
          In addition to benchmarks using real-world datasets synthetic RDF
benchmarks have been proposed in the literature. The Lehigh University Benchmark
(LUBM) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] uses a simple university domain ontology, provides a scalable data
generator that can produce several millions of triples, and a set of test queries.
The LUBM data is regular and hence the benchmark does not explore any of
RDF’s distinguishing properties that pose interesting challenges for query
optimizers. LUBM’s workload consists of mainly simple lookup and join queries that
retrieve only data triples. The University Ontology Benchmark (UOBM) [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] is
based on LUBM; it tackles complex inference and includes queries that address
scalability issues in addition to those studied by LUBM. UOBM uses schemas
that introduce constructs from OWL Lite and OWL DL sublanguages of OWL.
In contrast to LUBM, UOBM queries are designed specifically to consider
multiple lookups and complex joins, and it is required that the queries also include at
least one different type of OWL inference. SP2Bench [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] builds upon the DBLP
simple bibliographic schema. It comes with a generator that produces arbitrarily
large datasets by taking into account the constraints expressed in terms of this
schema. Unfortunately, the produced data is again relational-like in the sense
that the data is well structured, and no RDF schema constructs are
considered. The queries employ different SPARQL 1.0 operators and contain long path
chains and bushy patterns, in addition to complex combinations of SPARQL
filter, optional and bound expressions. Finally, the Berlin SPARQL Benchmark
(BSBM) [
          <xref ref-type="bibr" rid="ref1 ref12">1, 12</xref>
          ] is a broadly accepted and used benchmark built around an
ecommerce scenario. It provides a scalable data generator and a test driver, as
well as a set of queries that measure the performance of RDF engines for very
large datasets but not their ability to perform reasoning tasks. BSBM is the first
benchmark to propose a test driver that can be used in a systematic way to test
the performance of RDF engines.
        </p>
        <p>SPB goes one step beyond the aforementioned benchmarks given that the
generator can produce arbitrarily large datasets (in the order of billions of triples),
proposes a much more complex workload than the previous ones consisting of queries
that contain a fairly large number of triple patterns considering all SPARQL 1.0
operators as well as nested queries of high complexity.
In this paper we presented the Semantic Publishing Benchmark (SPB )
developed in the context of the Linked Data Benchmark Council (LDBC) European
project inspired by BBC’s Dynamic Semantic Publishing” (DSP) concept. We
presented in detail the benchmark’s schema, data generator, query workload and
performance metrics, and reported on the results of a set of experiments we
conducted against the Virtuoso and GraphDB RDF engines. In the next versions
of SPB we plan to use more expressive OWL ontologies for representing
information. We also plan to improve the modelling of content-to-entity cardinalities
as to enrich query sets and realistic query frequencies by taking advantage of FT
data and query logs respectively.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Acknowledgments References</title>
        <p>The work presented in this paper was funded by the FP7 project LDBC (#317548)</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schultz</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <source>The Berlin SPARQL Benchmark. IJSWIS</source>
          <volume>5</volume>
          (
          <issue>2</issue>
          ) (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>LDBC: benchmarks for graph and RDF data management</article-title>
          .
          <source>In: IDEAS</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erling</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <string-name>
            <surname>TPC-H Analyzed</surname>
          </string-name>
          :
          <article-title>Hidden Messages and Lessons Learned from an Influential Benchmark</article-title>
          . In: TPCTC (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martinez</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angles</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bishop</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotsev</surname>
          </string-name>
          , V.:
          <article-title>D2.2.2 Data Generator</article-title>
          .
          <source>Tech. rep., Linked Data Benchmark Council</source>
          (
          <year>2013</year>
          ), available at http://ldbcouncil.org/sites/default/files/LDBC_D2.2.2.pdf
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ma</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>G</surname>
          </string-name>
          , Xie, Pan,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.</surname>
          </string-name>
          :
          <article-title>Towards a Complete OWL Ontology Benchmark</article-title>
          . In: ESWC (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Morsey</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ngomo</surname>
            ,
            <given-names>A.C.N.</given-names>
          </string-name>
          :
          <article-title>DBpedia SPARQL Benchmark - Performance assessment with real queries on real data</article-title>
          .
          <source>In: ISWC</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moerkotte</surname>
          </string-name>
          , G.:
          <article-title>Characteristic Sets: Accurate Cardinality Estimation for RDF Queries with Multiple Joins</article-title>
          . In: ICDE (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Neumann</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>The RDF-3X engine for scalable management of RDF data</article-title>
          .
          <source>VLDB Journal</source>
          <volume>19</volume>
          (
          <issue>1</issue>
          ) (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Pham</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erling</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          :
          <article-title>S3G2: a Scalable Structure-correlated Social Graph Generator</article-title>
          . In: TPCTC (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Redaschi</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consortium</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          , et al.:
          <article-title>Uniprot in rdf: Tackling data integration and distributed annotation with the semantic web (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Schmidt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hornung</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lausen</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pinkel</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>SP2Bench: A SPARQL performance benchmark</article-title>
          .
          <source>In: ICDE</source>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12. Berlin SPARQL Benchmark (BSBM). http://wifo5-
          <fpage>03</fpage>
          .informatik.unimannheim.de/bizer/berlinsparqlbenchmark/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>13. DBSB. http://aksw.org/Projects/DBPSB</mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>14. GraphDB. http://ontotext.com/products/ontotext-graphdb/graphdb-standard/</mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>15. LUBM. http://swat.cse.lehigh.edu/projects/lubm</mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>16. UniProtKB Queries. http://www.uniprot.org/help/query-fields</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>17. Virtuoso Universal Server. http://virtuoso.openlinksw.com/</mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Suchanek</surname>
            ,
            <given-names>F.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kasneci</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          , G.:
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In: WWW</source>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Tsialiamanis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sidirourgos</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fundulaki</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boncz</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>Heuristics-based query optimisation for SPARQL</article-title>
          . In: EDBT (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>