<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable In-memory RDFS Closure on Billions of Triples</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eric L. Goodman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Mizell</string-name>
          <email>dmizell@cray.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Cray, Inc.</institution>
          ,
          <addr-line>Seattle, WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Sandia National Laboratories</institution>
          ,
          <addr-line>Albuquerque, NM</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <fpage>17</fpage>
      <lpage>31</lpage>
      <abstract>
        <p>We present an RDFS closure algorithm, specifically designed and implemented on the Cray XMT supercomputer, that obtains inference rates of 13 million inferences per second on the largest system configuration we used. The Cray XMT, with its large global memory (4TB for our experiments), permits the construction of a conceptually straightforward algorithm, fundamentally a series of operations on a shared hash table. Each thread is given a partition of triple data to process, a dedicated copy of the ontology to apply to the data, and a reference to the hash table into which it inserts inferred triples. The global nature of the hash table allows the algorithm to avoid a common obstacle for distributed memory machines: the creation of duplicate triples. On LUBM data sets ranging between 1.3 billion and 5.3 billion triples, we obtain nearly linear speedup except for two portions: file I/O, which can be ameliorated with the additional service nodes, and data structure initialization, which requires nearly constant time for runs involving 32 processors or more.</p>
      </abstract>
      <kwd-group>
        <kwd>Semantic Web</kwd>
        <kwd>RDFS Closure</kwd>
        <kwd>Cray XMT</kwd>
        <kwd>Hashing</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Semantic web data in its most common format, the Resource Description
Framework (RDF), has two primary means for defining ontologies: RDF Schema (RDFS)
and the Web Ontology Language (OWL). Ontologies are useful mechanisms for
organizing domain knowledge, allowing explicit, formal descriptions of data to be
defined and shared. Also, ontologies allow reasoning about the domain, exposing
new facts through application of the ontology to existing data.</p>
      <p>In this paper we examine the simpler of the ontology languages, RDFS, and
perform RDFS reasoning on billions of triples completely in-memory on a highly
multithreaded shared-memory supercomputer, the Cray XMT. To our
knowledge, no one has performed such large RDFS reasoning in a single global shared
address space, nor has anyone previously achieved the inferencing rates we report
in this paper.</p>
      <p>The rest of the paper is organized as follows. Section 2 describes the Cray
XMT, its unique characteristics, and why it may be well suited to RDFS closure
and many semantic web applications in general. Section 3 describes the algorithm
we employ to perform closure. Sections 4 and 5 describe the experimental setup
and the results. We then conclude in sections 6 and 7 with a comparison to other
approaches and our path forward.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Cray XMT</title>
      <p>The Cray XMT is a unique shared-memory machine with multithreaded
processors especially designed to support fine-grained parallelism and perform well
despite memory and network latency. Each of the custom-designed compute
processors (called Threadstorm processors) comes equipped with 128 hardware
threads, called streams in XMT parlance, and the processor instead of the
operating system has responsibility for scheduling the streams. To allow for single-cycle
context switching, each stream has a program counter, a status word, eight
target registers, and thirty-two general purpose registers. At each instruction cycle,
an instruction issued by one stream is moved into the execution pipeline. The
large number of streams allows each processor to avoid stalls due to memory
requests to a much larger extent than commodity microprocessors. For
example, after a processor has processed an instruction for one stream, it can cycle
through the other streams before returning to the original one, by which time
some requests to memory may have completed. Each Threadstorm processor can
currently support 8 GB of memory per processor, all of which is globally
accessible. The system we use in this study has 512 processors and 4 TB of shared
memory. We also employed 16 Opteron nodes of the service partition, which
directly perform file and network I/O on behalf of the compute processors.</p>
      <p>Programming on the XMT consists of writing C/C++ code augmented with
non-standard language features including generics, intrinsics, futures, and
performancetuning compiler directives such as pragmas.</p>
      <p>Generics are a set of functions the Cray XMT compiler supports that operate
atomically on scalar values, performing either read, write, purge, touch, and
int_fetch_add operations. Each 8-byte word of memory is associated with a
full/empty bit and the read and write operations interact with these bits to
provide light-weight synchronization between threads. Here are some examples
of the generics provided:
– readxx: Returns the value of a variable without checking the full-empty bit.
– readf e: Returns the value of a variable when the variable is in a full state,
and simultaneously sets the bit to be empty.
– writeef : Writes a value to a variable if the variable is in the empty state,
and simultaneously sets the bit to be full.</p>
      <p>– int f etch add: Atomically adds an integer value to a variable.</p>
      <p>Besides generics, there are also intrinsic functions that expose many low-level
machine operations to the C/C++ programmer.</p>
      <p>
        Parallelism is achieved explicitly through the use of futures, or implicitly,
when the complier attempts to automatically parallelize for loops. Futures
allow programmers to explicitly launch threads to perform some function. Besides
explicit parallelism through futures, the compiler attempts to automatically
parallelize for loops, enabling implicit parallelism. The programmer can also provide
pragmas that provide hints to the compiler on how to schedule iterations of the
for loop to various threads, whether it be by blocks, interleaved, or dynamically,
or supply hints on how many streams to use per processor, etc. We extensively
use the #pragma mta for all streams i of n construct that allows
programmers to be cognizant of the total number of streams that the runtime has assigned
to the loop, as well as providing an iteration index that can be treated as the id
of the stream assigned to each iteration.
The XMT and its predecessors have a significant history of performing quite
well on graph algorithms and on applications that can be posed as graph
problems. Bader and Madduri [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] report impressive execution times and speedup for
fundamental graph theory problems, breadth-first search and st -connectivity.
Later, Madduri et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] use the ∆-stepping algorithm, performing single source
shortest path on a scale-free, billion-edge graph in less than ten seconds on 40
processors. More recently, Chin et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] examine triad census algorithms for
social network analysis on graphs with edges in the hundreds of millions. Also,
Jin et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] perform power grid contingency analysis on the XMT by posing the
problem as finding important links using betweeness centrality of the edges as a
metric.
      </p>
      <p>
        To see how this relates to the semantic web, consider that the semantic web
amounts to a large, sparse graph. The predicate of an RDF triple can be thought
of as a directed edge between two nodes, the subject and the object. Several have
proposed thinking of the semantic web as a graph, and have suggested extensions
to SPARQL to enable advanced graph queries otherwise not available in standard
SPARQL, including SPARQ2L [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], nSPARQL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], and SPARQLeR [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. Also,
Stocker et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] present an optimization scheme for SPARQL queries based on
the graph nature of RDF data. They state one limitation of the approach is due
to its implementation in main memory. However, this is not as much of an issue
for the XMT with its scalable global shared memory. Indeed, for the machine
we use in this study, its 4 TB of shared memory is sufficient for the data of
many semantic web applications to reside completely in memory. The XMT is
well suited for applications with (a) abundant parallelism, and (b) little or no
locality of reference. Many graph problems fit this description, and we expect
that some semantic web applications will as well, especially those that involve
more complex queries and inferencing.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Algorithm</title>
      <p>
        The algorithm we present here performs incomplete RDFS reasoning, and
calculates only the subset of rules that require two antecedents, namely rdfs rules 2,
3, 5, 7, 9, and 11 (for reference, see Table 1). This is fairly common in practice,
as the other rules are easily implemented and parallelized and generally produce
results that are not often used or can be inferred later at run time for individual
queries. The flow of the algorithm (see Figure 2) is similar to the one outlined
by Urbani et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] in that only a single pass over the rule set is required
(excluding rare cases where the ontology operates on RDFS properties) for full
and correct inferencing. However, we move processing of the transitivity rules,
5 and 11, to the beginning. These rules accept no inputs except for existing
ontological triples, and thus can be executed safely at the beginning. Also, we
do not remove duplicates because our algorithm produces no duplicates. We use
a global hash table described by Goodman et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to store the original and
inferred triples. Thus, duplicate removal is accomplished in-situ, i.e. insertion of
any triples produced during inferencing that already exists in the table results
in a no-op.
      </p>
      <p>
        As a preprocessing step, we run the set of triples through a dictionary
encoding that translates the resource and predicate strings into 64 bit integers. We
again use the hash table described earlier to store the mapping and perform the
translation. As the triples are now sets of three integers, we can make use of the
following hash function during the RDFS closure algorithm:
hash(t) = ((t.s + t.p · B + t.o · B2) · C)
mod Stable
(1)
where t is a triple, C = 31, 280, 644, 937, 747 is a large prime constant (taken
from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), Stable is the size of the hash table, and B is a base to prevent the
subRule Condition 1
lg s p o (o is a literal)
gl s p :n
rdf1 s p o
rdf2 s p o (o is a literal of type t)
rdfs1 s p o (o is a literal)
rdfs2 p domain x
rdfs3 p range x
rdfs4a s p o
rdfs4b s p o
rdfs5 p subPropertyOf q
rdfs6 p type Property
rdfs7 s p o
rdfs8 s type Class
rdfs9 s type x
rdfs10 s type Class
rdfs11 x subClassOf y
rdfs12 p type
Container
      </p>
      <p>MembershipProperty
rdfs13 o type Datatype</p>
      <p>Condition 2 (optional) Triple to Add
s p :n
s p o
p type Property
:n type t
:n type Literal
s p o s type x
s p o o type x
s type Resource
o type Resource
q subPropertyOf r p subPropertyOf r</p>
      <p>p subPropertyOf p
p subPropertyOf q s q o
s subClassOf</p>
      <p>Resource
x subClassOf y s type y</p>
      <p>s subClassOf s
y subClassOf z x subClassOf z
p subPropertyOf</p>
      <p>member
o subClassOf Literal
ject, predicate, and object ids from colliding. The current dictionary encoding
algorithm assigns ids to resources and properties by keeping a running counter,
and a triple element’s id is the value of the counter at the time the element was
seen during processing. Thus, the set of integers assigned to the subject,
predicate, and objects overlap. The base helps the hash function to avoid collisions
due to this overlap. For our experiments, we used B = 5e9, near the maximum
value assigned to any triple element. This hash function was decided upon after
some experimentation, but probably deserves further attention.</p>
      <p>The first step (line 1 of Figure 2) of the RDFS closure process on the XMT is
to transfer the RDF triple data from the service nodes to the compute nodes. We
use the Cray XMT snapshot library that allows programs to move data back and
forth from the Lustre file system. On the service nodes a Linux process called a
file service worker (fsworker) coordinates the movement of data from disk to the
compute nodes. Multiple file service workers can run on multiple service nodes,
providing greater aggregate bandwidth.</p>
      <p>The next step (lines 2-5 of Figure 2) is to load the ontological data into
multimap data structures. All of the rules under consideration involve a join
between a
– data triple or an ontological triple with an
– ontological triple,
and in all cases, the subject of the latter ontological triple is matched with a
subject, predicate, or object of the former. To speed processing of the rules and
the associated join operations, we create four multimaps of the form f : Z → Z∗,
one for each of the predicates rdfs:subPropertyOf, rdfs:domain, rdfs:range,
and rdfs:subClassOf, that maps subject ids to potentially multiple object ids.
For example, the rdfs:subClassOf multimap is indexed according to class
resources and maps to all stated super classes of those resources. After initially
populating the multimap with known mappings as asserted in the original triple
set (see Figure 3(a)), we then update each of the rdfs:subPropertyOf and
rdfs:subClassOf multimaps with a complete listing of all super properties and
super classes as dictated by inference rules 5 and 11 (see Figure 3(b)).</p>
      <p>After populating the multimaps and applying the transitivity rules, we then
replicate the entirety of this data and the accompanying data structures for
each and every stream (see Figure 3(c)). The reason for this is to avoid
readhotspotting. The ontology is relatively small compared with the data to be
processed, so many streams end up vying for access if only one copy of the ontology
is available to all streams. Replicating the data circumvents these memory
contention issues.</p>
      <p>The next two steps (lines 6-7 of Figure 2) involve iterating over all of the
original triples and adding them to the hash table, and also populating four
queues, also implemented as hash tables, that store triples matching rules 2, 3,
7, and 9. Ideally, this could be solved with one pass through the original triple set,
but the XMT favors tight, succinct loops. Processing all steps at once results in
poor scalability, probably because the compiler is better able to optimize smaller,
simpler loops. Thus, each of these individual steps receives its own for loop.
(a) Original Multimap</p>
      <p>(b) Transitivity Applied
(c) Replicated</p>
      <p>The last four steps (lines 8-14 of Figure 2) complete the calculation of RDFS
closure. In general, each of these steps follow the ComputeRule procedure
outlined in Figure 4. There are three parameters: an array of triples that match the
given rule, a multimap that stores subject/object mappings for a given
predicate, and a buffer to store new triples. We use the for all streams construct,
allowing us to know the total number of streams, num streams, and a stream
identifier, stream id to use within the outer for loop. Line 5 partitions the
matching triples into nearly equal-sized chunks for each stream. The for loop from lines
7 to 10 determines the number new triples that will be created. This first pass
at the data allows us to call int_fetch_add once for each stream instead of l
times, where l is the length of the matching triple array. In essence, when we call
int_fetch_add in line 11 we claim a block of indices in the buffer array that can
be iterated over locally instead of requiring global cooperation by all streams.
This helps to avoid hot-spots on the total variable. Lines 12 through 17 iterate
over the stream’s set of triples and then iterate over the set of matching rules in
the multimap to create new inferred triples. After application of ComputeRule,
inferred triples in the buffer are added both to the main hash table and to
appropriate rule queues. Again, these two steps are done separately to maintain
scalability.</p>
      <p>
        Overall, the scalability and success of this algorithm largely rests on three
main factors:
1: procedure ComputeRule(matching triples, multimap, buffer )
2: total ← 0
3: for all streams stream id of num streams do
4: l ← length(matching triples)
5: beg, end ← determine beg end(l, num streams, stream id)
6: local total ← 0
7: for i ← beg, end do
8: num ← num values(matching triples[i], multimap)
9: local total ← local total + num
10: end for
11: start = int f etch add(total, local total)
12: for i ← beg, end do
13: for j ← 1, num values(matching triples[i]) do
14: buffer [start] ← new triple(matching triples[i], multimap, j)
15: start ← start + 1
16: end for
17: end for
18: end for
19: end procedure
We used the Lehigh University Benchmark (LUBM) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Specifically, we
generated the LUBM10k and LUBM40k data sets, approximately 1.33 and 5.34 billion
triples, respectively. Both increase in size after closure by about 25%, resulting
in 341 million and 1.36 billion inferences.
      </p>
      <p>To validate our approach, we applied a fixed-point algorithm written in
python3 to smaller LUBM instances, commenting out rules we did not implement
in our algorithm.</p>
      <p>
        The code we wrote for RDFS closure on the XMT is open source and
publically available. Most of the code is published as part of the MapReduceXMT
code base4. Our initial implementation first looked at porting the MapReduce
implementation of Urbani, et al. [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to the XMT, but to obtain better
performance we adopted an implementation that was more closely tailored for the
XMT. The main class is rdfs_closure_native.cpp located in the test
directory. We also made use of functionality provided by the MultiThreaded Graph
Library5, primarily mtgl/xmt_hash_table.hpp.
5
      </p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>3
http://www.ivan-herman.net/Misc/PythonStuff/RDFSClosure/Doc/RDFSClosuremodule.html
4 https://software.sandia.gov/trac/MapReduceXMT
5 https://software.sandia.gov/trac/mtgl
Total
All but Init and IO
File IO
Initialization
5.90</p>
      <p>512
14 x 106
12
d
n
co10
e
s
/s 8
e
c
n
re 6
e
If
n
4
20
103
We also examined how the number of fsworkers affects aggregate I/O throughput
to the compute nodes. We kept the number of processors constant at 128 and
varied the number of fsworkers between 1 and 16, each fsworker running on
a single node of the service partition. We tested the read time on LUBM40k.
Figure 7 shows the rates obtained and the ideal rates expected if performance
increased linearly with the number of fsworkers. Overall, we experienced about
a 12.4 increase in I/O performance for 16 fsworkers over 1 fsworkers, indicating
that if I/O is a bottleneck for particular applications, it can be ameliorated with
additional service nodes and fsworkers.
6</p>
    </sec>
    <sec id="sec-5">
      <title>Comparison to other Approaches</title>
      <p>
        There are two main candidates for comparison with our approach. The first is an
MPI-based implementation of RDFS closure developed by Weaver and Hendler
[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which we will refer to simply as MPI. The second is WebPIE, a
MapReduce style computation developed by Urbani et al. The authors first focused on
4
/)B3
s
G
(
e
t
r2
a
O
I
1
0
0
      </p>
      <p>File IO rate
Ideal IO speedup</p>
      <p>
        RDFS closure [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and then expanded their efforts to include a fragment of OWL
semantics [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Weaver and Hendler developed an embarrassingly parallel algorithm for RDFS
closure by replicating the ontology for each process and partitioning up the data.
They utilized a 32 node machine with two dual-core 2.6 GHz Opteron processors
per node. They achieved linear scalability but with the side effect of producing
duplicates. The individual processes do not communicate and do not share data.
Thus, it is impossible to know if duplicates are being produced.</p>
      <p>Besides the fundamental algorithmic difference between our implementation
and theirs concerning duplicate creation (they create duplicates when we do
not), there are several other factors which make a direct comparison difficult.
For one, they operate on raw triple data rather than integers. Also, they perform
all of the rules listed in Table 1 while we compute a subset. Finally, they include
both reading the data in from disk and writing the result out, where we only
report the read time.</p>
      <p>
        The work of Urbani et al. offers a much more direct comparison to our work,
as they first perform a dictionary encoding algorithm [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] as a preprocessing
step and also perform a similar subset of RDFS rules as we do in this work.
They employ a series of a MapReduce functions to perform RDFS closure. Like
the Weaver and Hendler approach, their algorithm produces duplicates during
processing; however, they account for that and have a specific stage dedicated for
removal of duplicates. Their experiments were conducted on a 64 node machine
with two dual-core 2.4GHz Opteron processors per node. It should be noted that
they’ve shown results computing closure on 100 billion triples. Our approach,
since we limit ourselves to in-memory computations, cannot reach that scale,
at least given a similar number of processors. With a specialized version of the
hash table code with memory optimizations not available in the more general
implementation, we estimate the largest data set we could handle within 4 TB
of memory to be about 20 billion triples. However, the next generation XMT
system, due around the end of 2010, is expected to have as much as eight times
as much memory per processor.
      </p>
      <p>MPI (32 nodes)
WebPIE (64 nodes)
XMT (512 proc.)
XMT (256 proc.)
XMT (128 proc.)
XMT (64 proc.)
XMT (32 proc.)</p>
      <p>Inferences/second
574e3
∼ 700e3
13.2e6
11.9e6
9.43e6
6.50e6
4.04e6
We have presented an RDFS closure algorithm for the Cray XMT that achieves
inference rates far beyond that of the current state of the art, about 7-9 times
faster when comparing Threadstorm processors to commodity processor cores.
Part of our approach’s success is due to the unique architecture of the Cray XMT,
allowing us to store large triple stores entirely in memory, avoiding duplication
of triples and also costly trips to disk.</p>
      <p>There are several obvious avenues for future work. In this paper we
examined only artificially generated data sets with a relatively simple ontology. We
would like to examine real-world data, and especially explore the effect that more
complex ontologies have on our algorithm and make adjustments as necessary.</p>
      <p>
        We also want to expand this work to encompass OWL semantics, probably
focusing first on the Horst fragment [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], as it is among the most computationally
feasible subsets of OWL. We believe the XMT will again be well suited to this
problem, maybe even more so than RDFS. For instance, some rules in OWL
Horst require two instance triples to be joined with one schema triple. This
poses a challenge for distributed memory machines in that you can no longer
easily partition the data as with RDFS, which had at most one instance triple as
an antecedent. The XMT can store all of the instance triples in its large global
memory, and create indices as needed to quickly find matching triples.
      </p>
      <p>Also, there are some tricks to try that might help eliminate some of the
constant overhead we see with file I/O (for a fixed number of service nodes)
and data structure initialization. File I/O and initialization both require low
numbers of threads, so it may be possible to overlay them, using a future to
explicitly launch initialization just before loading the data.</p>
      <p>
        One limitation of the approach presented in this paper is the use of fixed
table sizes, forcing the user to have accurate estimates of the number inferences
that may be generated. Using Hashing with Chaining and Region-based Memory
Allocation (HACHAR), also presented in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], may alleviate this requirement. It
allows for low-cost dynamic growth of the hash table and permits load factors
far in excess of the initial size; however, the current implementation exhibits
poorer scalability than the open addressing scheme we used for this paper.
      </p>
      <p>
        Finally, we wish to explore a fairer comparison between our work on a shared
memory platform to a similar in-memory approach on distributed memory
machines. The work presented by Weaver and Hendler [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is the closest we can
find to in-memory RDFS closure on a cluster, but the algorithmic differences
make it difficult to draw definite conclusions. The MapReduce-MPI Library 6,
a MapReduce inspired framework built on top of MPI, permits the
development of entirely in-memory algorithms using standard MapReduce constructs.
As such, it is an ideal candidate for producing a more comparable RDFS closure
implementation for distributed memory platforms.
      </p>
      <p>Acknowledgments. This research was funded by the DoD via the CASS-MT
program at Pacific Northwest National Laboratory. We wish to thank the Cray
XMT software development team and their manager, Mike McCardle, for
providing us access to Nemo, the 512-processor XMT in Cray’s development lab.
We also thank Greg Mackey and Sinan al-Saffar for their thoughtful reviews of
this paper.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Anyanwu</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maduko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          <article-title>SPARQ2L: Towards Support for Subgraph Extraction Queries in RDF Databases</article-title>
          .
          <source>In Proceedings of WWW. Banff</source>
          , Alberta, Canada (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madduri</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Designing Multithreaded Algorithms for Breadth-First Search and st-connectivity on the Cray MTA-2</article-title>
          .
          <source>In Proceedings of the 35th International Conference on Parallel Processing. Columbus</source>
          ,
          <string-name>
            <surname>OH</surname>
          </string-name>
          , USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Chin</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Marquez</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Choudhury</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maschoff</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>Implementing and Evaluating Multithreaded Triad Census Algorithms on the Cray XMT</article-title>
          .
          <source>In Proceedings of the 23rd IEEE International Symposium on Parallel and Distributed Processing</source>
          . Rome, Italy (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haglin</surname>
            <given-names>D.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Scherrer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chavarr</surname>
          </string-name>
          <article-title>´ıa-</article-title>
          <string-name>
            <surname>Miranda</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mogill</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feo</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Hashing Strategies for the Cray XMT</article-title>
          .
          <source>In Proceedings of the IEEE Workshop on Multi-Threaded Architectures and Applications</source>
          . Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          , USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Guo</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heflin</surname>
            ,
            <given-names>J. LUBM</given-names>
          </string-name>
          :
          <article-title>A Benchmark for OWL Knowledge Base Systems</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          -3) (
          <year>October 2005</year>
          )
          <fpage>158</fpage>
          -
          <lpage>182</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Chavarr´ıa, D.G.,
          <string-name>
            <surname>Feo</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wong</surname>
            ,
            <given-names>P.C.</given-names>
          </string-name>
          <article-title>A Novel Application of Parallel Betweeness Centrality to Power Grid Contingency Analysis</article-title>
          .
          <source>In Proceedings of the 24th IEEE International Parallel and Distributed Processing Symposium</source>
          . Atlanta,
          <string-name>
            <surname>GA</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Kochut</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janik</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>SPARQLeR: Extended Sparql for Semantic Association Discovery</article-title>
          .
          <source>In Proceedings of the 4th European Semantic Web Conference</source>
          . Innsbruck,
          <string-name>
            <surname>Austria</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Madduri</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>D.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berry</surname>
            ,
            <given-names>J.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crobak</surname>
            ,
            <given-names>J.R.</given-names>
          </string-name>
          <article-title>An Experimental Study of a Parallel Shortest Path Algorithm for Solving Large-Scale Graph Instances</article-title>
          .
          <source>In Proceedings of the Workshop on Algorithm Engineering and Experiments</source>
          . New Orleans, LA, USA (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9. ter Horst,
          <string-name>
            <given-names>H.J.</given-names>
            <surname>Completeness</surname>
          </string-name>
          , Decidability, and
          <article-title>Complexity of Entailment for RDF Schema and a Semantic Extension Involving the OWL Vocabulary</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          -3) (
          <year>October 2005</year>
          )
          <fpage>79</fpage>
          -
          <lpage>115</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. P´erez, J.,
          <string-name>
            <surname>Arenas</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guiterrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>nSPARQL: A Navigational Language for RDF</article-title>
          .
          <source>In Proceedings of the 7th International Semantic Web Conference</source>
          . Springer, Karlsruhe, Germany (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. RDF Semantics,
          <source>W3C Recommendation, 10 Februrary</source>
          <year>2004</year>
          , http://www.w3.org/ TR/2004/REC-rdf-mt-
          <volume>20040210</volume>
          /
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Stocker</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seaborne</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bernstein</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kiefer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reynolds</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>SPARQL Basic Graph Pattern Optimization Using Selectivity Estimation</article-title>
          .
          <source>In Proceedings of WWW</source>
          . Beijing, China (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <article-title>Scalable Distributed Reasoning using MapReduce</article-title>
          .
          <source>In Proceedings of the 8th International Semantic Web Conference</source>
          . Springer, Washington D.C.,
          <string-name>
            <surname>USA</surname>
          </string-name>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maassen</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>OWL reasoning with WebPIE: calculating the closure of 100 billion triples</article-title>
          .
          <source>In Proceedings of the 7th Extended Semantic Web Conference</source>
          . Heraklion,
          <string-name>
            <surname>Greece</surname>
          </string-name>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maaseen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <article-title>Massive Semantic Web data compression with MapReduce</article-title>
          .
          <source>In Proceedings of the MapReduce workshop at High Performance Distributed Computing Symposium</source>
          . Chicago, IL, USA (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          <string-name>
            <surname>Parallel</surname>
          </string-name>
          <article-title>Materialization of the Finite RDFS Closure for Hundreds of Millions of Triples</article-title>
          .
          <source>In Proceedings of 8th International Semantic Web Conference</source>
          . Washington, DC, USA (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>