=Paper= {{Paper |id=Vol-1193/paper_65 |storemode=property |title=Parallel OWL 2 RL Materialisation in Centralised, Main-Memory RDF Systems |pdfUrl=https://ceur-ws.org/Vol-1193/paper_65.pdf |volume=Vol-1193 |dblpUrl=https://dblp.org/rec/conf/dlog/MotikNPHO14 }} ==Parallel OWL 2 RL Materialisation in Centralised, Main-Memory RDF Systems== https://ceur-ws.org/Vol-1193/paper_65.pdf
             Parallel OWL 2 RL Materialisation in
            Centralised, Main-Memory RDF Systems

        Boris Motik, Yavor Nenov, Robert Piro, Ian Horrocks and Dan Olteanu
                    firstname.lastname@cs.ox.ac.uk

       Department of Computer Science, Oxford University, Oxford, United Kingdom



       Abstract. We present a novel approach to parallel materialisation (i.e., fixpoint
       computation) of OWL RL Knowledge Bases in centralised, main-memory, multi-
       core RDF systems. Our approach comprises a datalog reasoning algorithm that
       evenly distributes the workload to cores, and an RDF indexing data structure that
       supports efficient, ‘mostly’ lock-free parallel updates. Our empirical evaluation
       shows that our approach parallelises computation very well so, with 16 physical
       cores, materialisation can be up to 13.9 times faster than with just one core.


1   Introduction
The OWL 2 RL profile forms a fragment of datalog so that reasoning over OWL 2
RL knowledge bases (KB) can straightforwardly be rendered into answering datalog
queries. These datalog queries are either the OWL 2 RL/RDF rules [18, Section 4.3]
applied to the KB or the datalog translations [10] of the OWL 2 RL axioms of the
KB applied to the data portion (ABox) of the KB. Answering datalog queries can be
solved by backward chaining [2, 23], or one can materialise all consequences of the
rules and the data so that subsequent queries can be answered without the rules. Mate-
rialisation supports efficient querying, so it is commonly used in practice, but it is also
very expensive. We show that materialisation can be efficiently parallelised on modern
multi-core systems. In addition, main-memory databases have been gaining momen-
tum in academia and practice [16] due to the decreasing cost of RAM, so we focus
on centralised, main-memory, multi-core RDF systems. We present a new materialisa-
tion algorithm that evenly distributes the workload to cores, and an RDF data index-
ing scheme that supports efficient ‘mostly’ lock-free data insertion. Our techniques are
complementary to the ones for shared-nothing distributed RDF systems with nontrivial
communication cost between the nodes [23, 20, 25]: each node can parallelise compu-
tation and/or store RDF data using our approach.
     Materialisation has P-complete data complexity and is thus believed to be inher-
ently sequential. Nevertheless, many practical parallelisation techniques have been de-
veloped; we discuss these using the following OWL 2 RL example axioms and their
translation into datalog rules.

                  AvB                                  A(x, y) → B(x, y)                   (R1)
               C ◦E vD                       C(x, y) ∧ E(y, z) → D(x, z)                   (R2)
               D◦E vE                        D(x, y) ∧ E(y, z) → C(x, z)                   (R3)
Interquery parallelism identifies rules that can be evaluated in parallel. For example,
rules (R2) and (R3) must be evaluated jointly since C and D are mutually dependent,
but rule (R1) is independent since B is independent from C and D. Such an approach
does not guarantee a balanced workload distribution: for example, the evaluation of (R2)
and (R3) might be more costly than of (R1); moreover, the number of independent com-
ponents (two in our example) limits the degree of parallelism. Intraquery parallelism
assigns distinct rule instantiations to threads by constraining variables in rules to domain
subsets [7, 21, 9, 29, 27]. For example, with N threads and assuming that all objects
are represented as integers, the ith thread can evaluate (R1)–(R3) with (x mod N = i)
added to the rules’ antecedents. Such a static partitioning does not guarantee an even
workload distribution due to data skew. Systems such as WebPIE [23], Marvin [20],
C/MPI [25]; DynamiTE [24], and [13] use variants of these approaches to support OWL
2 RL fragments such as RDFS or pD∗ [22].
    In contrast, we handle general, recursive datalog rules using a parallel variant of the
seminaı̈ve algorithm [2]. Each thread extracts a fact from the database and matches it to
the rules; for example, given a fact E(a, b), a thread will match it to atom E(y, z) in rule
(R2) and evaluate subquery C(x, b) to derive the rule’s consequences, and it will handle
rule (R3) analogously. We thus obtain independent subqueries, each of which is evalu-
ated on a distinct thread. The difference in subquery evaluation times does not matter
because the number of queries is proportional to the number of tuples, and threads are
fully loaded. We thus partition rule instantiations dynamically (i.e., as threads become
free), unlike static partitioning which is predetermined and thus susceptible to skew.
    To support this idea in practice, an RDF storage scheme is needed that (i) supports
efficient evaluation of subqueries, and (ii) can be efficiently updated in parallel. To sat-
isfy (i), indexes over RDF data are needed. Hexastore [26] and RDF-3X [19] provide
six-fold sorted indexes that support merge joins and allow for a high degree of data
compression. Such approaches may be efficient if data is static, but data changes con-
tinuously during materialisation so maintaining sorted indexes or re-compressing data
can be costly and difficult to parallelise. Storage schemes based on columnar databases
[15] with vertical partitioning [1] suffer from similar problems.
    To satisfy both (i) and (ii), we use hash-based indexes that can efficiently match
all RDF atoms (i.e., RDF triples in which some terms are replaced with variables) and
thus support the index nested loops join. Hash table access can be easily parallelised,
which allows us to support ‘mostly’ lock-free [14] updates: most of the time, at least
one thread is guaranteed to make progress regardless of the remaining threads; how-
ever, threads do occasionally resort to localised locking. Lock-free data structures are
resilient to adverse thread scheduling and thus often parallelise better than lock-based
ones. Compared to the sort-merge [3] and hash join [4] algorithms, the index nested
loops join with hash indexes exhibits random memory access which is potentially less
efficient than sequential access, but our experiments suggest that hyperthreading and a
high degree of parallelism can compensate for this drawback.
    We have implemented our approach in a new system called RDFox and have eval-
uated its performance on several synthetic and real-world datasets. Parallelisation was
beneficial in all cases, achieving a speedup in materialisation times of up to 13.9 with
16 physical cores, rising up to 19.3 with 32 virtual cores obtained by hyperthreading.
Our system also proved competitive with OWLIM-Lite (a commercial RDF system)
and our implementation of the seminaı̈ve algorithm without parallelisation on top of
PostgreSQL and MonetDB, with the latter systems running on a RAM disk. We did not
independently evaluate query answering; however, queries are continuously answered
during materialisation, so we believe that our results show that our data indexing scheme
also supports efficient query answering over RDF data.


2      Preliminaries

A term is a resource (i.e., a constant) or a variable; we denote terms with t, and vari-
ables with x, y, and z. An (RDF) atom is a triple hs, p, oi of terms called the subject,
predicate, and object, respectively. A fact is a variable-free atom. A rule r has the form
(1), where H is the head atom, and B1 , . . . , Bn are body atoms; we let h(r) ··= H and
bi (r) ··= Bi .

                                      B1 ∧ . . . ∧ Bn → H                                     (1)

Rules must be safe: each variable in H must occur in some Bi . A program P is a finite
set of possibly recursive rules. The materialisation (i.e., the fixpoint) P ∞ (I) of a finite
set of facts I with P , a substitution σ and its application Aσ to an atom A, and the
composition τ θ of substitutions τ and θ are defined as usual [2].
    Our RDF system does not use the fixed OWL 2 RL/RDF rule set [18, Section 4.3].
Instead, we translate the OWL 2 RL axioms of the given KB into a datalog program;
such programs generally contain simpler rules with fewer body atoms and are thus
easier to evaluate. Our approach is, however, also applicable to OWL 2 RL/RDF rules.


3      Parallel Datalog Materialisation

We now present our algorithm that, given a datalog program P and a finite set of facts I,
computes P ∞ (I) on N threads. The data structure storing these facts (which we iden-
tify with I) must support several abstract operations: I.add(F ) should check whether
I contains a fact F and add it to I if not; moreover, I should provide an iterator factsI
where factsI .next returns a not yet returned fact or ε if such a fact does not exist, and
factsI .hasNext returns true if I contains a not yet returned fact. These operations need
not enjoy the ACID1 properties, but they must be linearisable [14]: each asynchronous
sequence of calls should appear to happen in a sequential order, with the effect of each
call taking place at an instant between the call’s invocation and response. Accesses to I
thus does not require external synchronisation via locks or critical sections.
    Furthermore, I must support an interface for answering conjunctive queries con-
strained to a subset of I; the latter will be used to prevent repeated derivations by a
rule. To formalise this, we assume that I can be viewed as a vector; then, for F ∈ I
a fact, I