<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TripleRush: A Fast and Scalable Triple Store</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Philip Stutz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mihaela Verman</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lorenz Fischer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Abraham Bernstein</string-name>
          <email>bernsteing@ifi.uzh.ch</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DDIS, Department of Informatics, University of Zurich</institution>
          ,
          <addr-line>Zurich</addr-line>
          ,
          <country country="CH">Switzerland</country>
        </aff>
      </contrib-group>
      <fpage>50</fpage>
      <lpage>65</lpage>
      <abstract>
        <p>TripleRush is a parallel in-memory triple store designed to address the need for e cient graph stores that quickly answer queries over large-scale graph data. To that end it leverages a novel, graph-based architecture. Speci cally, TripleRush is built on our parallel and distributed graph processing framework Signal/Collect. The index structure is represented as a graph where each index vertex corresponds to a triple pattern. Partially matched copies of a query are routed in parallel along di erent paths of this index structure. We show experimentally that TripleRush takes less than a third of the time to answer queries compared to the fastest of three state-of-the-art triple stores, when measuring time as the geometric mean of all queries for two benchmarks. On individual queries, TripleRush is up to three orders of magnitude faster than other triple stores.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        Many applications such as social network analysis, monitoring of nancial
transactions or analysis of web pages and their links require large-scale graph
computation. To address this need, many have researched the development of e cient
triple stores [
        <xref ref-type="bibr" rid="ref1 ref11 ref14">1, 14, 11</xref>
        ]. These systems borrow from the database literature to
investigate e cient means for storing large graphs and retrieving subgraphs, which
are usually de ned via a pattern matching language such as SPARQL. This
avenue has had great success both in research [
        <xref ref-type="bibr" rid="ref1 ref11 ref14">1, 14, 11</xref>
        ] and practice.1 However,
many of these systems are built like a centralized database, raising the question
of scalability and parallelism within query execution. One way to increase the
e ciency of parallel pipelined joins in such centralized databases is the use of
sideways information passing [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
      </p>
      <p>
        Other approaches focus on adapting triple stores to a distributed setting:
MapReduce [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has been used to aggregate results from multiple single-node
RDF stores in order to support distributed query processing [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
      </p>
      <p>
        Others have mapped SPARQL query execution pipelines to chains of
MapReduce jobs (e.g., [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]). Whilst this provides scalability, the authors point out that
the rigid structure of MapReduce and the high latencies when starting new jobs
constrain the possibilities for optimizations [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Distributed graph processing
frameworks such as Pregel [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], GraphLab/Powergraph [
        <xref ref-type="bibr" rid="ref4 ref7">7, 4</xref>
        ], Trinity [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and
1 http://virtuoso.openlinksw.com, http://www.ontotext.com/owlim
our own Signal/Collect [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] can o er more exibility for scalable querying
of graphs.
      </p>
      <p>
        Among the existing triple stores we only know of Trinity.RDF [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] to be
implemented on top of such an abstraction. Trinity.RDF is a graph engine for
SPARQL queries that was built on the Trinity distributed graph processing
system. To answer queries, Trinity.RDF represents the graph with adjacency
lists and combines traditional query processing with graph exploration.
      </p>
      <p>
        In this paper we introduce TripleRush, a triple store which is based on an
index graph, where a basic graph pattern SPARQL query is answered by
routing partially matched query copies through this index graph. Whilst traditional
stores pipe data through query processing operators, TripleRush routes query
descriptions to data. For this reason, TripleRush does not use any joins in the
traditional sense but searches the index graph in parallel. We implemented TripleRush
on top of Signal/Collect, a scalable, distributed, parallel and vertex-centric
graph processing framework [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>The contributions of this paper are the following: First, we present the
TripleRush architecture, with an index graph consisting of many active
processing elements. Each of these elements contains a part of the processing logic
as well as a part of the index. The result is a highly parallel triple store based on
graph-exploration. Second, as a proof of concept, we implemented the TripleRush
architecture within our graph processing framework Signal/Collect, bene
ting from transparent parallel scheduling, e cient messaging between the active
elements, and the capability to modify a graph during processing. Third, we
evaluated our implementation and compared it with three other
state-of-theart in-memory triple stores using two benchmarks based on the LUBM and
DBPSB datasets. We showed experimentally that TripleRush outperforms the
other triple stores by a factor ranging from 3.7 to 103 times in the geometric
mean of all queries. Fourth, we evaluated and showed data scalability for the
LUBM benchmark. Fifth, we measured memory usage for TripleRush, which is
comparable to that of traditional approaches. Last, we open sourced our
implementation.2</p>
      <p>In the next section we discuss the infrastructural underpinnings of TripleRush.
This is followed by a description of the TripleRush architecture, as well as the
functionality and interactions of its building blocks. We continue with a
description of the optimizations that reduce memory usage and increase performance.
We evaluate our implementation, discuss some of this paper's ndings as well as
limitations, and nish with some closing remarks.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Signal/Collect</title>
      <p>In this section we describe the scalable graph processing system
Signal/Collect and some of the features that make it a suitable foundation for TripleRush.
2 Apache 2.0 licensed, https://github.com/uzh/triplerush</p>
      <p>
        Signal/Collect [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]3 is a parallel and distributed large-scale graph
processing system written in Scala. Akin to Pregel [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], it allows to specify graph
computations in terms of vertex centric methods that describe aggregation of
received messages (collecting) and propagation of new messages along edges
(signalling). The model is suitable for expressing data- ow algorithms, with vertices
as processing stages and edges that determine message propagation. In contrast
to Pregel and other systems, Signal/Collect supports di erent vertex types
for di erent processing tasks. Another key feature, also present in Pregel, is that
the graph structure can be changed during the computation. The framework
transparently parallelizes and distributes the processing of data- ow algorithms.
Signal/Collect also supports features such as bulk-messaging and Pregel-like
message combiners to increase the message-passing e ciency.
      </p>
      <p>Most graph processing systems work according to the bulk-synchronous
parallel model. In such a system, all components act in lock-step and the slowest part
determines the overall progress rate. In a query processing use-case, this means
that one expensive partial computation would slow down all the other ones that
are executed in the same step, which leads to increased latency.
Signal/Collect supports asynchronous scheduling, where di erent partial computations
progress at their own pace, without a central bottleneck. The system is based
on message-passing, which means that no expensive resource locking is required.
These two features are essential for low-latency query processing.</p>
      <p>With regard to the representation of edges, the framework is exible. A vertex
can send messages to any other vertex: Edges can either be represented explicitly
or messages may contain vertex identi ers from which virtual edges are created.
TripleRush uses this feature to route query messages.
3</p>
    </sec>
    <sec id="sec-3">
      <title>TripleRush</title>
      <p>The core idea of TripleRush is to build a triple store with three types of
Signal/Collect vertices: Each index vertex corresponds to a triple pattern, each
triple vertex corresponds to an RDF triple, and query vertices coordinate query
execution. Partially matched copies of queries are routed in parallel along di
erent paths of this structure. The index graph is, therefore, optimized for e cient
routing of query descriptions to data and its vertices are addressable by an ID,
which is a unique [ subject predicate object ] tuple.</p>
      <p>We rst describe how the graph is built and then explain the details of how
this structure enables e cient parallel graph exploration.
3.1</p>
      <sec id="sec-3-1">
        <title>Building the Graph</title>
        <p>The TripleRush architecture is based on three di erent types of vertices.
Index and triple vertices form the index graph. In addition, the TripleRush graph
contains a query vertex for every query that is currently being executed. Fig. 1
shows the index graph that is created for the triple [ Elvis inspired Dylan ]:
3 http://uzh.github.io/signal-collect/
Elvis
*
*
Triple vertices are illustrated on level 4 of Fig. 1 and represent triples in the
database. Each contains subject, predicate, and object information.
Index vertices, illustrated in levels 1 to 3 in Fig. 1, represent triple patterns
and are responsible for routing partially matched copies of queries (referred
to as query particles ) towards triple vertices that match the pattern of the
index vertex. Index vertices also contain subject, predicate, and object
information, but one or several of them are wildcards. For example, in Fig. 1
the index vertex [ * inspired * ] (in the middle of the gure on level 2) routes
to the index vertex [ * inspired Dylan ], which in turn routes to the triple
vertex [ Elvis inspired Dylan ].</p>
        <p>Query vertices, depicted in the example in Fig. 2, are query dependent. For
each query that is being executed, a query vertex is added to the graph. The
query vertex emits the rst query particle that traverses the index graph. All
query particles|successfully matched or not|eventually get routed back to
their respective query vertex. Once all query particles have succeeded or
failed the query vertex reports the results and removes itself from the graph.</p>
        <p>The index graph is built by adding a triple vertex for each RDF triple that
is added to TripleRush. These vertices are added to Signal/Collect, which
turns them into parallel processing units. Upon initialization, a triple vertex will
add its three parent index vertices (on level 3) to the graph and add an edge
from these index vertices to itself. Should any parent index vertex already exist,
then only the edge is added from this existing vertex.</p>
        <p>When an index vertex is initialized, it adds its parent index vertex, as well as
an edge from this parent index vertex to itself. Note that the parent index vertex
always has one more wildcard than its child. The construction process continues
recursively until the parent vertex has already been added or the index vertex
has no parent. In order to ensure that there is exactly one path from an index
vertex to all triple vertices below it, an index vertex adds an edge from at most
one parent index vertex, always according to the structure illustrated in Fig. 1.</p>
        <p>Next we describe how the index graph allows parallel graph exploration in
order to match SPARQL queries.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Query Processing</title>
        <p>The index graph we just described is di erent from traditional index structures,
because it is designed for the e cient parallel routing of messages to triples that
correspond to a given triple pattern. All vertices that form the index structure
are active parallel processing elements that only interact via message passing.</p>
        <p>A query is de ned by a list of SPARQL triple patterns. Each query execution
starts by adding a query vertex to the TripleRush graph. Upon initialization, a
query vertex emits a single query particle. A query particle consists of the list of
unmatched triple patterns, the ID of its query vertex, a list of variable bindings,
a number of tickets, and a ag that indicates if the query execution was able
to explore all matching patterns in the index graph. Next, we describe how the
parts of the query particle are modi ed and used during query execution.</p>
        <p>The emitted particle is routed (by Signal/Collect) to the index vertex
that matches its rst unmatched triple pattern. If that pattern is, for example,
[ Elvis inspired ?person ], where ?person is a variable, then it will be sent to the
index vertex with ID [ Elvis inspired * ]. This index vertex then sends copies of
the query particle to all its child vertices.</p>
        <p>Once a query particle reaches a triple vertex, the vertex attempts to match
the next unmatched query pattern to its triple. If this succeeds, then a variable
binding is created and the remaining triple patterns are updated with the new
binding. If all triple patterns are matched or a match failed,4 then the query
particle gets routed back to its query vertex. Otherwise, the query particle gets
sent to the index or triple vertex that matches its next unmatched triple pattern.</p>
        <p>If no index vertex with a matching ID is found, then a handler for
undeliverable messages routes the failed query particle back to its query vertex. So
no matter if a query particle found a successful binding for all variables or if it
failed, it ends up being sent back to its query vertex.</p>
        <p>In order to keep track of query execution and determine when a query has
nished processing, each query particle is endowed with a number of tickets. The
rst query particle starts out with a very large number of tickets.5</p>
        <p>When a query particle arrives at an index vertex, a copy of the particle is
sent along each edge. The original particle evenly splits up its tickets among
its copies. If there is a remainder, then some particles get an extra ticket. If a
particle does not have at least one ticket per copy, then copies only get sent
4 A match fails if it creates con icting bindings: Pattern [ ?X inspired ?X ] fails to bind
to the triple [ Elvis inspired Dylan ], because the variable ?X cannot be bound to both
Elvis and Dylan.
5 We use Long.MaxValue, which has been enough for a complete exploration of all
queries on all datasets that we have experimented with so far.
along edges for which at least one ticket was assigned, and those particles get
agged to inform the query vertex that not all matching paths in the graph
were explored. Query execution nishes when the sum of tickets of all failed and
successful query particles received by the query vertex equals the initial ticket
endowment of the rst particle that was sent out.</p>
        <p>Once query execution has nished, the query vertex reports the result that
consists of the variable bindings of the successful query particles, and then
removes itself from the graph.</p>
        <p>1</p>
        <p>Query Vertex
6</p>
        <p>Failure
Success
?X inspired ?Y
?Y inspired ?Z</p>
        <p>5
No vertex with id
[ Jobs inspired * ]
4</p>
        <p>As an example illustrating a full query execution, consider the relevant
subgraph created for the triples [ Elvis inspired Dylan ] and [ Dylan inspired Jobs ],
shown in Fig. 2 along with the query processing for the query: (unmatched =
[ ?X inspired ?Y ], [ ?Y inspired ?Z ]; bindings = fg). Execution begins in the
query vertex.
1 Once the query vertex has been added to the graph, it emits a query particle,
which is illustrated in blue. Given its rst triple pattern, the query particle
is routed to the index vertex with ID [ * inspired * ].
2 Inside the index vertex, the query particle is split into two particles, one
colored green and the other one orange (for illustration). The tickets of the
blue particle are evenly split among the green and the orange particle. Both
particles are sent down to their respective index vertex, the green one to [ *
inspired Dylan ] and the orange one to [ * inspired Jobs ]. These index vertices,
in turn, send the particles further down to their corresponding triple vertices.
3 The rst pattern of the green particle gets matched in the triple vertex [ Elvis
inspired Dylan ]. The triple vertex sends the updated particle (unmatched = [
Dylan inspired ?Z ]; bindings = f ?X=Elvis, ?Y=Dylan g) to the index vertex
with ID [ Dylan inspired * ], which in turn routes the particle towards all
triples that match the next unmatched pattern, [ Dylan inspired ?Z ].
4 From the index vertex, the green particle is routed down to the triple
vertex [ Dylan inspired Jobs ], which binds ?Z to Jobs. As there are no more
unmatched triple patterns, the triple vertex routes the particle containing
successful bindings for all variables back to its query vertex.</p>
        <p>The rst pattern of the orange particle gets matched in the triple vertex
[ Dylan inspired Jobs ]. The triple vertex sends the updated particle
(unmatched = [ Jobs inspired ?Z ]; bindings = f ?X=Dylan, ?Y=Jobs g) to the
index vertex with ID [ Jobs inspired * ]. The message cannot be delivered,
because no index vertex with that ID is found. The handler for undeliverable
messages reroutes the failed query particle to its query vertex.</p>
        <p>The query vertex receives both the successfully bound green and the failed
orange particle. Query execution has nished, because all tickets that were
sent out with the initial blue particle have been received again. The query
vertex reports the successful bindings f ?X=Elvis, ?Y=Dylan, ?Z=Jobs g and
then removes itself from the graph.</p>
        <p>The architecture and query execution scheme we described captures the
essence of how TripleRush works. Next, we explain how we improved them with
regard to performance and memory usage.
3.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Optimizations</title>
        <p>The system, as previously shown, already supports parallel graph exploration.
Here we describe how we implemented and optimized di erent components.
Dictionary Encoding While the examples we gave so far represented triple
patterns in terms of strings, the actual system implementation operates on a
dictionary encoded representation, where RDF resource identi ers and literals
are encoded by numeric IDs. Both wildcards and variables are also represented
as numeric IDs, but variable IDs are only unique in the context of a speci c
query.</p>
        <p>Index Graph Structure We remove triple vertices, and instead store the
triple information inside each of the three third level index vertices that have a
compatible triple pattern. We hence refer to these index vertices as binding index
vertices, because they can bind query particles to triples, which was previously
done by the triple vertices. This change saves memory and reduces the number
of messages sent during query execution.</p>
        <p>One question that arises from this change is: If the subject, predicate and
object of the next unmatched pattern of a particle are already bound, where should
that particle be routed to? With no single triple vertex responsible anymore,
TripleRush load balances such routings over the three binding index vertices
into which the triple information was folded.</p>
        <p>Index Vertex Representation In Fig. 1, one notices that the ID of an index
vertex varies only in one position|the subject, the predicate, or the object|
from the IDs of its children. To reduce the size of the edge representations, we do
not store the entire ID of child vertices, but only the speci cation of this position
consisting of one dictionary encoded number per child. We refer to these numbers
as child-ID-re nements. The same reasoning applies to binding index vertices,
where the triples they store only vary in one position from the ID of the binding
index vertex. Analogously, we refer to these di erences as triple-ID-re nements.</p>
        <p>Binding index vertices need to be able to check quickly if a triple exists. This
requires fast search over these triple-ID-re nements. We support this by storing
them in a sorted array on which we use the binary search algorithm.</p>
        <p>Index vertices of levels 1 and 2 do not need to check for the existence of
a speci c child-ID, as these levels always route to all children. Routing only
requires a traversal of all child-ID-re nements. To support traversal in a
memorye cient way, we sort the child-ID-re nements, perform delta encoding on them,
and store the encoded deltas in a byte array, using variable length encoding.</p>
        <p>Note that these array representations are ine cient for modi cations, which
is why we initially store the child/triple-ID-re nements in tree sets and switch
the representation to arrays once the loading is done. This design supports fast
inserts at the cost of increased memory usage during loading.</p>
        <p>Query Optimization The number of particle splits performed depends on the
order in which the triple patterns of a query are explored. One important piece of
information to determine the best exploration order is the number of triples that
match a given triple pattern, which we refer to as the cardinality of the pattern.
Because only relative cardinality di erences matter for ordering, we can assume
a very large number for the root vertex. The binding index vertices already store
all the triples that match their pattern and thus have access to their cardinalities.
So we only need to determine the cardinality of index vertices on level 2, below
the root vertex and above the binding index vertices. A level 2 index vertex
requests cardinality counts from its binding index vertex children and sums up
the replies. We do this once after graph loading and before executing queries,
but it can be done at any time and could also be done incrementally.</p>
        <p>Query optimization happens only once inside the query vertex before the
query particle is emitted. To support it, the query vertex rst sends out
cardinality requests to the vertices in the index graph that are responsible for the
respective triple patterns in the query. These requests get answered in parallel
and, once all cardinalities have been received, we greedily select the pattern with
the lowest cardinality to be matched rst. If this match will bind a variable, we
assume that the cardinality of all other patterns that contain this variable is
reduced, because only a subset of the original triples that matched the pattern
would be explored at that point. To this end, we divide the cardinality estimate
for each triple pattern containing bound variables by a constant per bound
variable. In our experiments we set the constant to 100, based on exploration.6 If
all variables in a pattern are bound (meaning that all its variables appear in
6 We tried di erent factors and this one performed well on the LUBM benchmark. It
also performed well on the DBPSB benchmark, which suggests that it generalizes at
least to some degree.
patterns that will get matched before it), then we assume a cardinality of 1,
designating that at most one triple could match this pattern.</p>
        <p>Repeating the procedure we choose the next pattern with the lowest
cardinality estimate, until all patterns have been ordered. Next, the initial query particle
and all its copies explore the patterns in the order speci ed by the optimization.
Optimizations to Reduce Messaging Each TripleRush vertex is
transparently assigned to a Signal/Collect worker. Workers are comparable to a
thread that is responsible for messaging and for scheduling the execution of
its assigned vertices.</p>
        <p>Sending many individual messages between di erent Signal/Collect
workers is ine cient, because it creates contention on the worker message queues. In
order to reduce the number of messages sent, we use a bulk message bus that
bundles multiple messages sent from one worker to another. In order to reduce
message sizes and processing time in the query vertex, we do not send the
actual failed particle back to the query vertex, but only its tickets.7 We also use a
Pregel-like combiner that sums up the tickets in the bulk message bus, to again
reduce the number of messages sent.</p>
        <p>Because the query vertex is a bottleneck, we further reduce the number of
messsages it receives and the amount of processing it does by combining multiple
successful particles into one result bu er before sending them. The query vertex
can concatenate these result bu ers in constant time.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>In the last section we described the TripleRush architecture and parallel query
execution. In this section we evaluate its performance compared to other
stateof-the-art triple stores.
4.1</p>
      <sec id="sec-4-1">
        <title>Performance</title>
        <p>
          TripleRush was built and optimized for query execution performance. In order
to evaluate TripleRush, we wanted to compare it with the fastest related
approaches. Trinity.RDF [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] is also based on a parallel in-memory graph store,
and it is, to our knowledge, the best performing related approach. Thus, our
evaluation is most interesting in a scenario where it is comparable to that of
Trinity.RDF. As Trinity.RDF is not available for evaluation, we decided to make
our results comparable by closely following the setup of their published parallel
evaluations. The Trinity.RDF paper also includes results for other in-memory
and on-disk systems that were evaluated with the same setup, which allows us
to compare TripleRush with these other systems in terms of performance.
7 The ag is also necessary and in practice we encode it in the sign.
Datasets and Queries Consistent with the parallel Trinity.RDF [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]
evaluation, we benchmarked the performance of TripleRush by executing the same
seven queries on the LUBM-160 dataset ( 21 million triples) and the same eight
queries on the DBPSB-10 dataset ( 14 million triples). The LUBM (Lehigh
University Benchmark) dataset is a synthetic one, generated with UBA1.7,8 while
the DBPSB (DBpedia SPARQL Benchmark) dataset is generated based on
realworld DBpedia data [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].9 The queries cover a range of di erent pattern
cardinalities, result set sizes and number of joins. The queries L1-L7 and D1-D8
are listed in the Appendix. These queries only match basic graph patterns and
do not use features unsupported by TripleRush, such as aggregations or lters.
More information about the datasets and the queries is found in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
Evaluation Setup In the Trinity.RDF paper, all triple stores are evaluated in
an in-memory setting, while RDF-3X and BitMat are additionally evaluated in
a cold cache setting [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>For evaluating TripleRush, we executed all queries on the same JVM running
on a machine with two twelve-core AMD OpteronTM 6174 processors and 66 GB
RAM, which is comparable to the setup used for the evaluation of Trinity.RDF.10
The whole set of queries was run 100 times before the measurements in order
to warm up the JIT compiler, and garbage collections were triggered before the
actual query executions. All query executions were complete, no query particle
ever ran out of tickets. We repeated this evaluation 10 times.11</p>
        <p>The execution time covers everything from the point where a query is
dispatched to TripleRush until the results are returned. Consistent with the
Trinity.RDF setup12, the execution times do include the time used by the query
optimizer, but do not include the mappings from literals/resources to IDs in the
query, nor the reverse mappings for the results.</p>
        <p>
          Result Discussion The top entries in Tables 1 and 2 show the minimum
execution times over 10 runs. According to our inquiry with the authors of the
Trinity.RDF paper [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], this is consistent with their measures. Additionally, we
also report the average execution times for TripleRush. TripleRush performs
fastest on six of the seven LUBM queries, and on all DBPSB queries. For the
query where TripleRush is not the fastest system, it is the second fastest system.
        </p>
        <p>On all queries, TripleRush is consistently faster than Trinity.RDF. In the
geometric mean of both benchmarks, TripleRush is more than three times faster
than Trinity.RDF, between seven and eleven times faster than RDF-3X (in
memory) and between 31 and 103 times faster than BitMat (in memory). For
individual queries the results are even more pronounced: On query L7 TripleRush
is about ten times faster than Trinity.RDF, on L1 it is more than two orders of
magnitude faster than RDF-3X (in memory) and on L4 TripleRush is more than
three orders of magnitude faster than BitMat (in memory).</p>
        <p>These results indicate that the performance of TripleRush is competitive
with, or even superior to other state-of-the-art triple stores.
Average over 10 runs
TripleRush
89.3 60.1 84.1 1.7 1.3 2.3 69.4
14.8
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Data Scalability</title>
        <p>Performance is a very important characteristic for a triple store, but it is also
important that the query execution time scales reasonably when queries are
executed over more triples.</p>
        <p>We evaluate the data scalability of TripleRush by executing the LUBM
queries L1-L7 with the same setup as in subsection 4.1, but ranging the dataset
sizes from 10 to 320 universities and measuring the average time over 10 runs.
The execution time for queries L1-L3 and L7 should increase with the size of the
dataset, which is proportional to the number of universities. Queries L4-L6 are
tied to a speci c university and, given a good query plan, should take
approximately the same time to execute, regardless of the dataset size. The left chart in
Fig. 3 shows the execution times on a linear scale, while for queries L1-L3 and L7
both number of universities and the execution times are shown on a logarithmic
scale. We see that queries L2 and L7 scale slightly sublinearly. Queries L1 and L3
scale almost linearly until LUBM-160, and then with a factor of more than three
on the step to LUBM-320. We are uncertain about what causes this di erence.
As expected, the results in the left chart in Fig. 3 show that for queries L4-L6
the query execution time does not increase with the dataset size.</p>
        <p>Overall, this evaluation suggests that TripleRush query execution times scale
as expected with increased dataset sizes, but leaves an open question related to
the scaling of queries L1 and L3 on LUBM-320.</p>
        <p>!)s3.5  
(m 3  
iem2.5  
tn 2  
o
i
tu1.5  
xce 1  
.e0.5  
g
vA 0  
10   20   40   80   160   320  </p>
        <p>LUBM Universities!
10   20   40   80   160   320  </p>
        <p>
          LUBM Universities!
Another important aspect of a triple store is the memory usage and how it
increases with dataset size. In order to evaluate this aspect, we measured the
memory usage of TripleRush after loading LUBM dataset sizes ranging from 10
to 320 universities and optimizing their index representations (smallest memory
footprint of entire program from 10 runs). Figure 4 shows that the memory usage
increases slighly sublinearly for this dataset. The memory footprint of TripleRush
L4  
L5  
L6  
!)s512  
(m256  
iem128  
tn 64  
tuo 32  
i
xce 16  
.eg 8  
vA 4  
is 5.8 GB when the 21 347 998 triples in the LUBM-160 datset are loaded. This
is equivalent to 272 bytes per triple for this dataset size. TripleRush requires
3.8 GB for the DBPSB-10 dataset with 14 009 771 triples, which is equivalent
to 271 bytes per triple. This is between a factor of 2 up to 3.6 larger than the
index sizes measured for these datasets in Trinity.RDF [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], but far from the
index size of 19 GB measured for DBPSB-10 in BitMat [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
        <p>Currently, graph loading and index optimization for LUBM-160 takes as
little as 106 seconds (without dictionary encoding, average over 10 runs). This
is because the tree set data structure we use during graph loading supports
fast insertions. The ip side is the high memory usage, which causes the
graphloading of the LUBM-320 dataset to take excessively long. Most of that time is
spent on garbage collection, most likely because the entire assigned 31 GB heap
is used up during loading. After loading is nished, the index representation
optimizations reduce the memory usage to a bit more than 11 GB.</p>
        <p>Overall, the index size of TripleRush is rather large, but that is in many cases
a reasonable tradeo , given the performance.
Our current investigation and design has a number of limitations that should be
addressed in future work.</p>
        <p>First, we need to evaluate TripleRush with additional benchmarks.</p>
        <p>Second, and more importantly, we need to investigate the performance of
TripleRush on larger graphs, in a distributed setting. Whilst we are optimistic
that some of its optimizations will help even in the distributed setting, it is
unclear what the overall performance impact of increased parallelism and increased
messaging cost will be.</p>
        <p>Third, TripleRush was not built with SPARQL feature-completeness in mind.
Many SPARQL constructs such as lters and aggregates were not implemented.</p>
        <p>Fourth, the current query optimization is very simple and could be improved.</p>
        <p>Fifth, the root vertex is a communication bottleneck. Potential avenues for
addressing this are to disallow a completely unbound query, which would retrieve
the whole database, or to partition this vertex.</p>
        <p>Sixth, the memory usage during graph loading should be reduced without
overly slowing down the loading times.</p>
        <p>Seventh, although the hardware we ran the benchmarks on had a lower
performance score, it is desirable to do a comparison with Trinity.RDF on exactly
the same hardware.</p>
        <p>Even in the light of these limitations, TripleRush is a highly competitive
system in terms of query execution performance. To our knowledge, it is the rst
triple store that decomposes both the storage and query execution into
interconnected processing elements, thereby achieving a high degree of parallelism that
contains the promise of allowing for transparent distributed scalability.
6</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>The need for e cient querying of large graphs lies at the heart of most Semantic
Web applications. The last decade of research in this area has shown tremendous
progress based on a database-inspired paradigm. Parallelizing these centralized
architectures is a complex task. The advent of multi-core computers, however,
calls for approaches that exploit parallelization.</p>
      <p>In this paper we presented TripleRush, an in-memory triple store that
inherently divides the query execution among a large number of active processing
elements that work towards a solution in parallel. We showed that this approach
is both fast and scalable.</p>
      <p>Whilst TripleRush has its limitations, it is a step towards providing
highperformance triple stores that inherently embrace parallelism.</p>
      <p>Acknowledgments We would like to thank the Hasler Foundation for the
generous support of the Signal/Collect Project under grant number 11072
and Alex Averbuch as well as Cosmin Basca for their feedback on our ideas.</p>
      <p>
        LUBM evaluation queries, originally used in
the BitMat evaluation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and selected by them
from OpenRDF LUBM Queries.
L1: SELECT ?X ?Y ?Z WHERE f
?Z ub:subOrganizationOf ?Y.
?Y rdf:type ub:University.
?Z rdf:type ub:Department.
?X ub:memberOf ?Z.
?X rdf:type ub:GraduateStudent.
      </p>
      <p>?X ub:undergraduateDegreeFrom ?Y.
g
L2: SELECT ?X ?Y WHERE f
?X rdf:type ub:Course.</p>
      <p>?X ub:name ?Y.
g
L3: SELECT ?X ?Y ?Z WHERE f
?X rdf:type ub:UndergraduateStudent.
?Y rdf:type ub:University.
?Z rdf:type ub:Department.
?X ub:memberOf ?Z.
?Z ub:subOrganizationOf ?Y.
?X ub:undergraduateDegreeFrom ?Y.</p>
      <p>DBPSB evaluation queries, received
courtesy of Kai Zeng.</p>
      <p>PREFIX foaf: &lt;http://xmlns.com/foaf/0.1/&gt;
PREFIX rdfs:
&lt;http://www.w3.org/2000/01/rdfschema#&gt;
PREFIX dbo: &lt;http://dbpedia.org/ontology/&gt;
PREFIX rdf:
&lt;http://www.w3.org/1999/02/22rdf-syntax-ns#&gt;
PREFIX dbpprop: &lt;http://dbpedia.org/
property/&gt;
PREFIX dbpres: &lt;http://dbpedia.org/
resource/&gt;
PREFIX rdfcore: &lt;http://www.w3.org/2004/02/
skos/core#&gt;
D1: SELECT ?X WHERE f</p>
      <p>?Y rdfcore:subject
dbpres:Category:Firstperson shooters.</p>
      <p>?Y foaf:name ?X.
g
D2: SELECT ?X WHERE f
?Z foaf:homepage ?Y.</p>
      <p>?Z rdf:type ?X.
g
D3: SELECT ?X ?Y ?Z WHERE f
?Z rdfcore:subject dbpres:Category:German musicians.
?Z foaf:name ?X.
?Z rdfs:comment ?Y.
g
L4: SELECT ?X ?Y1 ?Y2 ?Y3 WHERE f
?X ub:worksFor g
&lt;http://www.Department0.University0.edu&gt;. D4: SELECT ?W ?X ?Y ?Z WHERE f
?X rdf:type ub:FullProfessor. ?Z dbo:birthPlace dbpres:Berlin.
?X ub:name ?Y1. ?Z dbo:birthDate ?X.
?X ub:emailAddress ?Y2. ?Z foaf:name ?W.</p>
      <p>?X ub:telephone ?Y3. ?Z dbo:deathDate ?Y.
g
L5: SELECT ?X WHERE f
?X ub:subOrganizationOf
&lt;http://www.Department0.University0.edu&gt;.</p>
      <p>?X rdf:type ub:ResearchGroup.
g
L6: SELECT ?X ?Y WHERE f
?Y ub:subOrganizationOf</p>
      <p>&lt;http://www.University0.edu&gt;.
?Y rdf:type ub:Department.
?X ub:worksFor ?Y.</p>
      <p>?X rdf:type ub:FullProfessor.
g
L7: SELECT ?X ?Y ?Z WHERE f
?Y ub:teacherOf ?Z.
?Y rdf:type ub:FullProfessor.
?Z rdf:type ub:Course.
?X ub:advisor ?Y.
?X rdf:type ub:UndergraduateStudent.</p>
      <p>?X ub:takesCourse ?Z.
g
D5: SELECT ?X ?Y ?Z WHERE f
?Z rdfcore:subject dbpres:Category:Luxury vehicles.
?Z foaf:name ?Y.
?Z dbo:manufacturer ?W.</p>
      <p>?W foaf:name ?X.
g
D6: SELECT ?Z1 ?Z2 ?Z3 ?Z4 WHERE f
?X rdf:type ?Y.
?X dbpprop:name ?Z1.
?X dbpprop:pages ?Z2.
?X dbpprop:isbn ?Z3.</p>
      <p>?X dbpprop:author ?Z4.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>D.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Marcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Madden</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Hollenbach</surname>
          </string-name>
          .
          <article-title>Scalable Semantic Web Data Management Using Vertical Partitioning</article-title>
          .
          <source>In Proceedings of the 33rd International Conference on Very Large Data Bases</source>
          , pages
          <volume>411</volume>
          {
          <fpage>422</fpage>
          . VLDB Endowment,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Atre</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Chaoji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Zaki</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Hendler. Matrix</surname>
          </string-name>
          <article-title>"bit" loaded: a scalable lightweight join query processor for rdf data</article-title>
          .
          <source>In Proceedings of the 19th international conference on World wide web, WWW '10</source>
          , pages
          <fpage>41</fpage>
          {
          <fpage>50</fpage>
          , New York, NY, USA,
          <year>2010</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>J.</given-names>
            <surname>Dean</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ghemawat</surname>
          </string-name>
          .
          <source>Mapreduce: simpli ed data processing on large clusters. Communications of the ACM</source>
          ,
          <volume>51</volume>
          (
          <issue>1</issue>
          ):
          <volume>107</volume>
          {
          <fpage>113</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Gu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bickson</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          .
          <article-title>Powergraph: distributed graph-parallel computation on natural graphs</article-title>
          .
          <source>In Proceedings of the 10th USENIX conference on Operating Systems Design and Implementation</source>
          , OSDI'
          <volume>12</volume>
          , pages
          <fpage>17</fpage>
          {
          <fpage>30</fpage>
          , Berkeley, CA, USA,
          <year>2012</year>
          . USENIX Association.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>J.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Abadi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Ren</surname>
          </string-name>
          .
          <article-title>Scalable sparql querying of large rdf graphs</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>11</issue>
          ):
          <volume>1123</volume>
          {
          <fpage>1134</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Urbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Boncz</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Mika</surname>
          </string-name>
          .
          <article-title>Robust runtime optimization and skew-resistant execution of analytical sparql queries on pig</article-title>
          . In P. CudreMauroux, J. He in, E. Sirin,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tudorache</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Euzenat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hauswirth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. X.</given-names>
            <surname>Parreira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Hendler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Schreiber</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , and E. Blomqvist, editors,
          <source>International Semantic Web Conference (1)</source>
          , volume
          <volume>7649</volume>
          of Lecture Notes in Computer Science, pages
          <volume>247</volume>
          {
          <fpage>262</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Y.</given-names>
            <surname>Low</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gonzalez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kyrola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Bickson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. M.</given-names>
            <surname>Hellerstein</surname>
          </string-name>
          .
          <article-title>Graphlab: A new parallel framework for machine learning</article-title>
          .
          <source>In Conference on Uncertainty in Arti cial Intelligence (UAI)</source>
          , Catalina Island, California,
          <year>July 2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>G.</given-names>
            <surname>Malewicz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. H.</given-names>
            <surname>Austern</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. J. C.</given-names>
            <surname>Bik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. C.</given-names>
            <surname>Dehnert</surname>
          </string-name>
          , I. Horn,
          <string-name>
            <given-names>N.</given-names>
            <surname>Leiser</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Czajkowski.</surname>
          </string-name>
          <article-title>Pregel: a system for large-scale graph processing</article-title>
          .
          <source>In SIGMOD Conference</source>
          , pages
          <volume>135</volume>
          {
          <fpage>146</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <given-names>M.</given-names>
            <surname>Morsey</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.-C. N.</given-names>
            <surname>Ngomo</surname>
          </string-name>
          .
          <article-title>Dbpedia sparql benchmark: performance assessment with real queries on real data</article-title>
          .
          <source>In Proceedings of the 10th international conference on The semantic web - Volume Part I, ISWC'11</source>
          , pages
          <fpage>454</fpage>
          {
          <fpage>469</fpage>
          , Berlin, Heidelberg,
          <year>2011</year>
          . Springer-Verlag.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Scalable join processing on very large rdf graphs</article-title>
          .
          <source>In Proceedings of the 2009 ACM SIGMOD International Conference on Management of data, SIGMOD '09</source>
          , pages
          <fpage>627</fpage>
          {
          <fpage>640</fpage>
          , New York, NY, USA,
          <year>2009</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>T.</given-names>
            <surname>Neumann</surname>
          </string-name>
          and
          <string-name>
            <surname>G. Weikum.</surname>
          </string-name>
          <article-title>The RDF-3X engine for scalable management of RDF data</article-title>
          .
          <source>The VLDB Journal.The International Journal on Very Large Data Bases</source>
          ,
          <volume>19</volume>
          (
          <issue>1</issue>
          ):
          <volume>91</volume>
          {
          <fpage>113</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>B.</given-names>
            <surname>Shao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>The trinity graph engine</article-title>
          .
          <source>Technical report, Technical Report 161291</source>
          ,
          <string-name>
            <surname>Microsoft</surname>
            <given-names>Research</given-names>
          </string-name>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <given-names>P.</given-names>
            <surname>Stutz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W. W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          . Signal/Collect:
          <article-title>Graph Algorithms for the (Semantic) Web</article-title>
          . In P. P.
          <string-name>
            <surname>-S</surname>
          </string-name>
          . et al., editor,
          <source>International Semantic Web Conference (ISWC)</source>
          <year>2010</year>
          , volume LNCS
          <volume>6496</volume>
          , pages pp.
          <volume>764</volume>
          {
          <fpage>780</fpage>
          . Springer, Heidelberg,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>C. Weiss</surname>
          </string-name>
          , P. Karras,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          .
          <article-title>Hexastore: sextuple indexing for semantic web data management</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>1</volume>
          (
          <issue>1</issue>
          ):
          <volume>1008</volume>
          {
          <fpage>1019</fpage>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <given-names>K.</given-names>
            <surname>Zeng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>A distributed graph engine for web scale rdf data</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          ,
          <volume>6</volume>
          (
          <issue>4</issue>
          ),
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>