<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Rule-based Reasoning on Massively Parallel Hardware</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Peters</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Brink</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sabine Sachweh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Albert Zundorf</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Electrical Engineering</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Applied Sciences Dortmund, Germany, Department of Computer Science</institution>
        </aff>
      </contrib-group>
      <fpage>33</fpage>
      <lpage>48</lpage>
      <abstract>
        <p>In order to enable the semantic web as well as other time critical semantic applications, scaleable reasoning mechanisms are indispensable. To address this issue, in this paper we propose a rule-based reasoning algorithm which explores the highly parallel hardware of modern processors. In contrast to other approaches of parallel reasoning, our algorithm works with rules that can be de ned depending on the application scenario and thus is able to apply di erent semantics. Furthermore we show how vector-based operations can be used to implement a performant match algorithm. We evaluate our approach by applying the df, RDFS and pD* rule sets to di erent data sets and compare our results with other recent work. The evaluation shows that our approach is up to 9 times faster depending on the rule set and the used ontology and is able for example to apply the df rules to an ontology with 2.2 million triples (and 1.3 million inferred triples) in less than 6 seconds.</p>
      </abstract>
      <kwd-group>
        <kwd>rule-based reasoning</kwd>
        <kwd>GPU</kwd>
        <kwd>parallel reasoning</kwd>
        <kwd>Rete algorithm</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        The use of ontologies is widely spread whether they are used in scienti c
applications or to enable the semantic web. One key characteristic of ontologies
is the possibility to reason about the given data and to create new knowledge
by inferring facts that are implicitly given by the ontology. Depending on the
size and the structure of the data, the reasoning process may be very resource
consuming, especially with regard to the continuously growing amount of data
of the semantic web. On the other side, applications using ontologies for example
in the eld of ambient assisted living [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] or smart spaces [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] may be
time critical and need to process incoming data very fast.
      </p>
      <p>
        Di erent approaches exist to speedup the reasoning process and to provide
scalable solutions for di erent levels of expressivity, varying from improvements
on the reasoning process itself to distributed reasoning over clusters of
computational units. Especially the number of approaches using parallel structures has
increased in the past few years. The ELK reasoner [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] for example takes
advantage of multi-core and multi-processor systems of modern computers to perform
OWL EL3 reasoning. Other approaches rely on a distributed cluster and use the
MapReduce framework to perform RDFS and pD* (also know as OWL Horst)
[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] reasoning on RDF graphs with millions of triples [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        While these approaches perform very well for a prede ned set of rules that
de ne the strength of the semantics and thus the expressivity of the resulting
ontology, the use of a cluster of machines for computation may not always be
desirable due to the high costs. In addition the prede ned set of rules that is
implemented may not always be exactly what is needed for a speci c application.
In this paper we present an approach which uses the massively parallel
hardware of a modern graphic processor unit (GPU) to apply a set of freely de ned
rules to an RDF-based ontology. While a single core of a GPU has not as much
computation power like on a CPU, a GPU provides much more processors that
are able to perform simple computation tasks in parallel. Thus it makes sense to
exploit this highly parallel hardware and to break down the complex workload
into ne grained tasks for parallelisation. Our approach uses an adapted version
of the Rete algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and thus implements a forward chaining rule engine,
which is able for example to materialise the complete nite RDFS closure as well
as to apply the pD* rules in a scaleable manner on a single machine. Also this
approach is more exible than most other reasoners, because it is not dedicated
to a prede ned semantic, we achieve a very high performance due to the
parallelisation of the time consuming steps. In addition our approach is not required
to be executed on a GPU, but also can be executed on a multicore CPU.
      </p>
      <p>The main contribution of this paper will be to show how the Rete algorithm
can be used to reason on ontologies in a highly parallel manner. The use of a
rule-engine-based approach allows us to provide a reasoner that is not dedicated
to one prede ned semantic nor to a speci ed rule order, and thus can be used for
di erent rule sets of various complexity and semantics. We are also going to
introduce a concept for an e cient vector-based match algorithm which is one key
factor for the performance of our reasoner, called AMR (act-mobile reasoner).
In the next section we are starting with taking a deeper look on the related work
on high performance and parallel reasoning. In section 3 we describe the Rete
algorithm and show, how it can be used for parallel inferencing on parallel
processors. We will also provide more details on the OpenCL4 programming model
and show how this has an impact to our approach. To evaluate our concept we
use di erent rule sets applied to some well known ontologies of di erent sizes.
Finally we are going to discuss our results and give a conclusion.
3 http://www.w3.org/2007/OWL/wiki/EL
4 OpenCL: open standard for parallel programming of heterogeneous systems,
http://www.khronos.org/opencl/</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Particularly with regard to large ontologies of hundreds of millions of triples
recently the use of clusters for applying nite rules like for RDFS or OWL Horst
semantics were a focus of interest in the research community. In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
the authors presented WebPie, an inference engine based on the open-source
MapReduce implementation Hadoop, which is able to compute RDFS as well as
the pD* semantics on data sets containing billions of triples. The parallelisation is
achieved by encoding the necessary rules as a set of Map and Reduce operations
which are executed in a given order while the distribution of the workload is
handled by Hadoop. Other papers propose similar approaches also based on
MapReduce di ering in the implemented semantics like OWL 2 EL [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] or Fuzzy
pD [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Another approach relying on multiple computing machines is presented
in [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], where a divide-conquer-swap strategy is applied. The input data is stored
on a shared location and divided into smaller chunks which are processed by a
grid of compute nodes. The results are exchanged between those nodes for further
processing. Just like the previous mentioned approaches, the strategy relies on
a prede ned semantic. Another strategy for parallel reasoning is presented in
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] where the input data is also partitioned and processes running on a cluster
computing the nite RDF schema closure for each partition.
      </p>
      <p>
        Besides the use of a cluster to increase the scaleability, other approaches try
to take advantage of the parallel structures of a single machine like available
through modern multicore CPUs and GPUs. [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] for example propose
an approach for concurrent classi cation (TBox) and parallel ABox reasoning for
OWL 2 EL. Another interesting reasoning mechanism is presented in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], where
the df vocabulary, which represents a subset of the RDFS rules, is encoded to
be applied highly parallel on the graphic processor. While this approach does
not only consider ABox or TBox information of an ontology and shows great
performance, it still relies on a pre-de ned semantic. Nevertheless, this work is
most related to our work and will be used for evaluation in section 4.
      </p>
      <p>As outlined by the discussed approaches, many scaleable solutions exists for
reasoning on a cluster as well as on a single machine which support di erent
semantics. Nevertheless, as far as we know, there is no scaleable solution using
parallel inferencing for a general purpose reasoning process, where the semantic
can be de ned by the application in terms of simple rules, irrespective of whether
these semantics are based on a RDFS or OWL pro le or include application
speci c rules.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Parallelising Rule Execution</title>
      <p>OpenCL programming model
OpenCL is a heterogeneous programming framework that allows to develop
applications that execute across a range of device types and supports a wide range
of parallelism. While the device refers to the typically parallel processors like a
multicore CPU or a GPU, the host can be seen as the outer control logic that
prepares the execution of logic on the device. The logic executed on the device
is called kernel and thus is that part of an OpenCL program, that typically is
executed in parallel. To parallelise an application, the workload needs to be
partitioned into small chunks where each chunk can be computed by the same code
in parallel. Each chunk is handled by a work-item which runs in its own thread
and has a unique global identi er which can be used to identify that part of the
input data, that shall be processed by a work-item. In addition, work-items are
grouped into work-groups which have a limited size but can share local
memory which can be accessed much faster than global memory. Nevertheless, local
memory is very limited such that it needs to be used wisely. To achieve a high
performance it is important to have a kernel with as less control structures that
might be evaluated by two work-items in a di erent way as possible, because
this would lead to idling threads during the parallel execution.
3.2</p>
      <p>
        The Rete algorithm
What distinguishes the AMR reasoner from the aforementioned parallel
reasoners is that we do not know which rules shall be applied to an ontology and
thus can not provide dedicated methods that each implement one rule. Thus,
we have to implement a generic rule engine which is able to handle ontological
data. One widely used algorithm for this complex task was provided by Charles
L. Forgy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], the Rete Match algorithm, which is able to nd all the objects
matching a given pattern. The Rete algorithm builds a network of nodes where
each node corresponds to a pattern occurring on the left hand side (LHS) of a
rule (that part, that de nes the match conditions). Thus, each pattern on the
LHS corresponds to a match condition such as (?p rdfs:domain ?c). For each
single pattern an alpha node is created, while more than one pattern on the LHS
in addition leads to at least one beta node. Thus, a beta node always has more
than one pattern. For each node a list of matching objects is stored which is
created by propagating the working memory and thus the input data through
the network.
      </p>
      <p>Considering the following rules from the RDFS semantic:</p>
      <p>(?x ?p ?y) ! (?p rdf:type rdf:Property)
(?x ?p ?y) (?p rdfs:domain ?c) ! (?x rdf:type ?c)
(R1)
(R2)
The Rete algorithm would create one alpha node from the left hand side of the
rst rule R1, and one more alpha node for the second pattern of the second rule
R2. Because the rst pattern of R2 is equal to the pattern of R1, both rules
would share one node. Because R2 consists of two patterns, in addition one beta
node would be created connecting the two other alpha nodes. Further considering
the following working memory from a simple university example, which contains
three triples consisting of a subject, predicate and object (s; p; o):</p>
      <sec id="sec-3-1">
        <title>Bob uni:publishes Paper1 (WM1) 36</title>
      </sec>
      <sec id="sec-3-2">
        <title>Alice uni:publishes Paper2 uni:publishes rdfs:domain Researcher (WM2) (WM3)</title>
        <p>The Rete network resulting by parsing R1 and R2 to alpha- and beta nodes and
propagating the working memory through the network can be seen in gure 1.
α1
WM1
WM2
WM3</p>
        <p>β1
WM1
WM2</p>
        <p>WM3
WM3
α2
WM3</p>
        <p>The Rete network in gure 1 has three nodes in total, while the nal node
of R1 is 1 and the nal node of R2 is 1. A nal node always represents a
complete rule such that the results stored by that node can be used to re the
corresponding rule. Thus, R1 would re three times and produce two new triples
(and one duplicate) while R2 would re two times and produce also two new
triples. The nal working memory would be extended by the following triples:
uni:publishes rdf:type rdf:Property
rdfs:domain rdf:type rdf:Property</p>
      </sec>
      <sec id="sec-3-3">
        <title>Bob rdf:type Researcher</title>
      </sec>
      <sec id="sec-3-4">
        <title>Alice rdf:type Researcher</title>
        <p>(WM4)
(WM5)
(WM6)
(WM7)
The new triples would also be propagated through the network until no new
triples are derived and the rule engine has nished his job ( xpoint iteration).
3.3</p>
        <p>De nitions
As can be seen from the previous introduction, the Rete algorithm basically
performs three steps to apply rules to a working memory: an alpha-match, a
beta-match and the rule ring. Before continuing, we have to de ne some terms
that are used throughout this paper:
De nition 1. An alpha node speci es a node of the Rete network that directly
corresponds to one pattern of a rule. An alpha node always has exactly one
pattern that consists of three (pattern-)terms.
De nition 2. A beta node speci es a node of the Rete network that has exactly
two parents (p1 and p2). The parents in turn may be alpha- or beta nodes.
Depending on the position of a beta node in the network, it has a depth &gt;= 1
that is calculated by the longest distance to an alpha node.</p>
        <p>De nition 3. The pattern-width de nes the number of terms of the pattern
of one node. Thus, an alpha node always has a pattern-width of 3. A beta node
whose parents are both alpha nodes has a pattern-width of 6 and so on.
De nition 4. The matches of a node are referred to as a set A. Accordingly
the number of matches is de ned by jAj.</p>
        <p>De nition 5. Match-conditions are created for each node and de ne a
function C(m; :::) with m 2 A, that evaluates to true if a triple or a set of triples (in
case of a beta node) conform to the pattern of that node.
3.4</p>
        <p>
          Rete on Parallel Hardware
To parallelise the overall reasoning process, di erent strategies were proposed by
other approaches which essentially consists of data partitioning and rule
partitioning [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. While data partitioning means the dividing of the data into smaller
units and the independent processing of each unit, the rule partitioning approach
applies only a subset of the overall rules to the complete data set. Both types
of parallelisation require a synchronisation of the results and especially the data
partitioning approach may produce duplicates. Nevertheless, these drawbacks
might be unavoidable in a cluster-based environment where each of the parallel
processes runs in an independent environment.
        </p>
        <p>Using the parallel structures on a single computer, a parallelisation can take
place on a di erent level than rule or data partitioning. This is because in the
case of a single computer a synchronisation of interim results of the parallel
threads can be performed much more e cient by a single host application.
Furthermore, to achieve a high performance for example on a GPU it is important
to have lots of tasks that can be computed independently and where each task
consists only of a small workload. In order to take these considerations into
account, our approach parallelises the reasoning process on a deeper level. For
alpha-matching, one kernel is executed where for each triple that needs to be
considered a single thread (work-item) is created. This thread checks whether
the corresponding triple matches one or more of the alpha node patterns and
creates a list containing the matching nodes. Thus, each thread needs to iterate
over all of the alpha nodes. Finally the resulting list can simply be transformed
to create a match-list for each alpha node containing the matching triples. That
means that the number of threads that are executed in parallel is equal to the
number of triples available for processing.</p>
        <p>Because of its complexity, the beta-match is handled in a di erent way. Just
like the rule partitioning approach it would not be desirable to simply execute one
thread for each beta node in parallel because of the low number of nodes and the
high computation load. On the other side the number of match-steps, that need
to be computed heavily rises with the number of matches of the parents of one
beta node. Considering the example from gure 1, for 1 only 3x1 possibilities
needed to be computed. If 1 and 2 both had ten matches, the number of
possible combinations would arise to 100. By taking this complexity as well as
the fact, that the GPU is able to handle millions of work-items in a very e cient
way, into account, it is a natural way to apply the parallelism for the beta-step
during this match-algorithm. That means, that for each match of one parent of
a beta node a work-item is created which iterates over all matches of the second
parent of the beta node. Thus, the number i of work-items is de ned by:
and the number of iterations j each work-item has to perform by:
Def :</p>
        <p>i = jAp1j; with jAp1j &gt;= jAp2j
Def :</p>
        <p>j = jAp2j; with jAp2j &lt; jAp1j</p>
      </sec>
      <sec id="sec-3-5">
        <title>Finally the match-algorithm for a beta node is de ned as:</title>
        <p>C(mi; mj ); mi 2 Ap1; mj 2 Ap2
where mi denotes the parallelism and i corresponds to the rank of the thread.
By de ning jAp1j &gt;= jAp2j and jAp2j &lt; jAp1j we ensure, that the number of
parallel threads is at least as high as the number of iterations each thread has
to perform, which can be computed more e cient on the GPU. Furthermore the
match-algorithm needs to be performed for Ap1 Ap2.
(1)
(2)
(3)
α1
WM1</p>
        <p>...</p>
        <p>WM2000
β1</p>
        <p>α2
WM2001</p>
        <p>...</p>
        <p>WM2700
1 2
0 0
0 0
2 2
MW MW ...
Programming parallel hardware with OpenCL imposes some restrictions. First
of all, strings can not be computed in an e cient way and thus are not suitable
for a fast match-algorithm. Thus, all triple terms s; p; and o are transformed
into integer values, where each term is mapped to one value. Non literal terms
are mapped only once such that there is only one index for each term, even if
it is used a multiple times within the ontology. Furthermore, the memory used
by a kernel needs to be allocated before kernel execution with the consequence,
that each alpha- and beta-step has to be performed twice. The rst time, only
the number of matches is calculated (called match-count) while for the second
execution the number of matches is used to allocate the required memory for all
matches and thus the nal results can be created. Finally, the overall reasoning
process consists of an alpha-count where the resulting triples of an alpha match
are calculated, an alpha-match, where the nal results of the alpha-step are
created and an beta-count and beta-match for each depth (see de nition 2 in
section 3.3). The last step is to re the rules using the working memories of
the nal nodes and to create new triples. Eventually the algorithm iterates over
the complete process until no new triples are derived ( xpoint). In addition the
beta-count processes as well as the beta-match processes can be executed at
the same time for all beta nodes of one depth (non blocking operations). This
allows an out-of-order execution where the GPU may de ne the order of given
commands to optimise the throughput and is the reason, why we rst start all
beta-count and beta-match operations and read the result back in a following
loop.
The most computation intensive task during the reasoning process is the
betamatching. Thus it is essential to speed this task up and make the computation as
simple as possible. To achieve this, our approach uses a vector-based operation
to check, if a combination of triples matches the pattern of a beta node.
Vectorbased operations can be computed by a GPU in a very e cient way which is
why it is desirable to use them. To do this, rst of all we use so called unrolled
kernels for rules with up to four patterns (which is for example the max
patternwidth occurring in the OWL Horst rule set). This means, that instead of using
loops iterating over a de ned number of items (where in this case the number
of items depends on the number of patterns that a beta node has) we just write
the command to execute as often as it is needed. The kernels on the other hand
are executed by using a kernel implementation depending on the characteristics
of the beta node. This also allows us to exactly know the pattern-width of each
parent of a beta node during kernel implementation which is the basis for the
vector-based match algorithm.
(R2)</p>
        <p>Recapturing rule R2, where the nal beta node has two alpha nodes as
parents, an unrolled kernel can be executed which assumes a pattern-width of three
for both parents. Furthermore it can be seen, that to verify a match only the
second term of the rst pattern and the rst term of the second pattern needs
to be checked, such that the value for ?p in the rst pattern is equal to the value
of ?p of the second pattern. To do this, in each work-item a vector v1 is created
such that those elements from m1, that need to be compared to elements of
m2, are placed at the location in v1 where their corresponding element of m2
is located. In addition the elements in v1 need to be negated such that a later
performed addition can result in a null vector if the elements of both vectors
are equal except of their sign. Besides v1, another vector u is created which has
the same number of elements than v1 and is lled with elements equal to 1 at
those positions, where the second pattern holds an element that needs to be
considered to verify a match. All other elements of u are de ned as 0. Finally
the loop which runs over all matches of Ap2 only has to create a vector v2 which
holds all elements of m2. This vector is used to verify for a match as follows:
(v2
u) + v1
(4)
This operation is performed in a component based manner (meaning that a
concurrent, component based multiplication as well as a component based addition
is performed) and results in a null-vector, if a match was found. Otherwise, at
least one element of the resulting vector is unequal to 0. Furthermore only a
simple operation and a minimum of data transfer is necessary within the inner
loop of a kernel, which allows a very e cient execution even for a large jAp2j.</p>
        <p>To continue the example illustrated in gure 1 the working memory of 1
and 2 looks like depicted in gure 3. To improve the readability, not only the
α1
working memory reference (row 1) is given, but also the data (row 2) as well as
the internal representation of that data (row 3). The internal representation, like
described before, is a mapping of each subject, predicate and object to an integer
value, which is used for computation. Based on the aforementioned description,
three parallel threads would be executed to calculated A 1. Because every item
in the working memory of 1 matches the pattern (?x ?p ?y) and every item
in the working memory of 2 matches the pattern (?p rdfs:domain ?c) (see
rule R2) the only calculation necessary to see, if a combination of an 1 and an
2 match also is a match of the beta node 1, is to compare the corresponding
?p values. To do this, in each thread a constant vector v1 is created using the
corresponding match of 1. Looking at the rst thread, v1 holds the negated</p>
        <p>001
021</p>
        <p>001
Using the same calculation to match WM3 against WM3 (WM3 is a match of
1 as well as of 2) would not result in a null vector and thus would not be a
match:
?p-value of the WM1 element, which is positioned at the rst vector component,
because the corresponding ?p-element of the 2 match is also positioned at the
rst element. This results in v1 = ( 2; 0; 0). Because only the rst component
of the vectors need to be considered for R2 (only the ?p elements have to be
compared), the vector u is created as u = (1; 0; 0). Now, for each element of
A 2 a vector v2 is created which simply holds the numeric representation of the
corresponding triple such that v2 results in v2 = (2; 6; 7). The nal (component
based) calculation is as follows:
In addition to the unrolled and vector based kernels, we also implemented kernels
that can be used with a pattern-width larger than 4 which allows to write even
more complex rules with an arbitrary size.
4
4.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>
        df, RDFS and OWL Horst
Because our approach is able to handle a given set of rules independent of the
semantic that those rules belong to, we used three di erent rule sets with a
varying complexity, which often were implemented by other reasoners in a manual
way. The df vocabulary [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] is a simpli ed version of the RDFS semantic which
consists of all rules of RDFS with at least two rule body-terms. This semantic
imposes a more e cient reasoning, while the results of the missing rules are
supposed to be created on the y by a reasoner, if resources are queried. These
rules were also implemented by the MapReduce-based approach presented in
[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] as well as the by the GPU based approach proposed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Due to space
restrictions, we also refer to [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for the df rules.
      </p>
      <p>
        The second rule set consists of the complete RDFS rules like de ned by
the W3C5. This rule set consists of 13 rules with one or two antecedents and
is used in several other publications for evaluation purpose [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. The last
rule set formally known as pD* was proposed by Herman J. ter Horst which
incorporates RDFS and D entailment and extends these semantics with some
basic support for OWL [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. It provides a complete set of entailment rules and
has become a promising ontology language for the semantic web because of its
5 http://www.w3.org/TR/rdf-mt/#RDFSRules
(5)
(6)
expressiveness on the one side and its relatively low computation complexity on
the other side. For a complete overview of the 22 rules (some of them can be
combined as they share the same antecedents resulting in 16 rules) we refer due
to space restrictions to [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
4.2
      </p>
      <p>
        Test Environment
To achieve comparable results we choose ontologies with various sizes that were
already used to evaluate other approaches. Thus, we are using the Vicodi6
ontology which is an ontology of European history used for semantical indexing of
historical documents [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. The TBox is of a moderate size while the ABox
contains a large number of instances. In total, the ontology consists of 146,280 triples
and thus is compared to our second ontology, known from the Lehigh University
benchmark (LUBM)7, a small sized ontology. LUBM is a benchmark ontology
and de nes an TBox for a university scenario. A generator allows to generate
university data sets while the number of universities that are created can be
dened as an input. Thus, we created 3 LUBM data sets with 268,794 triples which
contains two universities (LUBM2), a LUBM5 ontology with 727,265 triples and
a LUBM10 ontology with 1,480,366 triples. To use another large ontology we
used the DBPedia8 3.7 which is a lightweight ontology containing structured
data extracted from Wikipedia. For this data set we used a similar setup like
described by [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] containing the DBpedia Ontology, Infobox Types and Infobox
Properties. We also limited the size of the data set in a similar way by scaling
the instance triples by 1=8th, 1=16th, and 1=32nd of the original size.
      </p>
      <p>
        Our implementation is Java based and integrates with Jena9 applications.
Thus, we use the Jena framework to parse the ontologies and create our own
data structures for the reasoning process by reading an ontology graph from
Jena. To be able to use OpenCL from our Java application, we use jocl10 as
Java bindings. The tests are performed on a work station with a 2.0 GHz Intel
Xeon processor with 6 cores and an AMD 7970 gaming graphic card with 3GB
of memory running an Ubuntu 12.04. In order to compare our results to other
approaches, we performed the same tests with the df rule set with the GPU
based reasoner proposed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In the following we refer to that reasoner as
grdfs reasoner. Other parallel reasoners also implementing the complete RDFS or
OWL Horst rules on a single computer considering the ABox as well as the TBox
were not available. We also run our experiment using the non-vector-based kernel
and compare the results with the vector-based version. For each experiment the
total time except data transformation (i.e. loading the Jena-graph to AMR and
le parsing for grdfs) were measured, which also includes the non parallel rule
ring. A dedicated kernel execution time on the GPU is not given due to the
6 http://www.vicodi.org/about.htm
7 http://swat.cse.lehigh.edu/projects/lubm/
8 http://dbpedia.org/About
9 http://jena.apache.org/
10 http://www.jocl.org/
execution of multiple kernels which are launched asynchronously such that no
precise information are available. Furthermore each test was performed ten times
while the average is presented.
4.3
      </p>
      <p>Results and Comparison
The rst experiment shows the impact of using the vector-based operations
during the beta-match. Therefore we used all three LUBM data sets and executed
it using the pD* rule set with the naive implementation of the kernel and with the
vector-based kernel. Notice that both kernels are unrolled and the only di erence
is, that the second kernel uses vector operations instead of calculating each
element in a single step. We choose the pD* vocabulary because it is the most
computation intensive one of the three used rule sets.
)200
s
(e100
itm 50
ign 25
n
sao 10
e
r
3
2</p>
      <p>5
LUBM data set
nvb
vb
10</p>
      <p>LUBM
triples entailed
nvb
vb</p>
      <p>speedup
2 268,794 38,584 6.05 s 3.72 s
5 727,265 102,618 40.01 s 23.69 s
10 1,480,366 207,677 160.80 s 95.39 s</p>
      <p>The results in gure 4 show that the vector-based kernel provides a constantly
better performance. The calculation using the vector-based method is round
about 1.7 times faster than using the naive implementation which simply applies
the comparison for two matches of the corresponding parent-nodes using single
operations. On a di erent hardware (MacBook Pro) even a speedup of more
than 4 could be measured using the vector-based operation.</p>
      <p>The next test shall show the scaleability of our approach. Therefore we use
the Vicodi ontology as well as the LUBM2 ontology and run our algorithm on
the CPU of our test environment. The CPU has less cores than the GPU and
is not that fast, but OpenCL allows us to use only a de ned number of cores
of the CPU. This way it is possible to show the speedup which is achieved by
increasing the number of used cores. Because the used CPU has 6 cores where
each core can run 2 threads through hyper-threading (resulting in 12 virtual
cores), we applied the pD* vocabulary to the input data using 1 to 12 cores.</p>
      <p>As can be seen from gure 5 the speedup nearly doubles with a doubling
of the number of used processors for both datasets until 6 cores are used. The
time (s)
300
250
200
150
100
50</p>
      <p>Vicodi</p>
      <p>LUBM2
1 2 3 4 5 6 7 8 9 10 11 12 cores
other 6 cores still contribute to a better performance, but the impact is not that
reasonable anymore. We assume that this is because two (virtual) cores always
share some resources on one processor and thus do not provide such a speedup
as is would be achieved if each core had physically exclusive resources.</p>
      <p>
        Finally we want to compare our results to other approaches. For this we
performed a set of tests with the GPU based grdfs reasoner proposed in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
as well as with AMR. Both reasoners used the same hardware like described
before. Because the grdfs reasoner implements the df semantic, the tests were
only performed using the corresponding rules. While we can not prove that our
implementation works correct, we still get exactly the same results using the
general purpose rule engine provided by the Jena framework for ontologies with
a limited size which that rule engine is able to handle. The largest ontology we
were able to test with Jena using the df vocabulary was the LUBM2 ontology
with 268,794 triples (and 146 entailed triples), which took about 27 minutes,
while our approach entails exactly the same triples in about 270 ms. We also
tested for example the Vicodi ontology using the Jena framework and the RDFS
vocabulary which took about 12 minutes and inferred 127,886 triples to a total
of 274,233. Our approach again provides exactly the same results using the GPU
in less than a second. The nal results of our tests including parallel and serial
work (the complete reasoning process excluding parsing) are listed in table 1.
triples
      </p>
      <p>AMR
grdfs
entailed AMR entailed grdfs speedup
LUBM2 268,794 276 ms 1,383 ms 146 22 5.02
LUBM5 727,265 447 ms 3,153 ms 146 22 7.05
LUBM10 1,480,366 676 ms 6,207 ms 146 22 9.22
DBPedia 1/32 1,087,364 3,061 ms 7,554 ms 1,087,364 1,085,309 2.47
DBPedia 1/16 2,276,510 5,931 ms 14,681 ms 1,936,950 1,934,887 2.48
DBPedia 1/8 4,523,729 10,954 ms 27,739 ms 3,083,513 3,081,433 2.53
Table 1: Reasoning time for di erent data sets using AMR and grdfs reasoner</p>
      <p>The experiment shows that our approach provides a speedup of a factor of
up to 9.2 compared to the grdfs reasoner. While the speedup for the LUBM data
sets is more signi cant than for the DBPedia data sets, it is still more than two
times faster. The di erent speedup factor results from the fact that using the
DBPedia data sets many new triples are inferred, such that up to 60% of the
reasoning time of the AMR reasoner is needed for the serial implemented rule
ring, which also includes operations like dictionary lookup to not infer duplicate
triples.</p>
      <p>
        Further results from other work for comparison can be used for example from
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], where the authors evaluated an approach to parallelise the RDFS closure
using an Opteron blade cluster, each server in the cluster having two dual-core
2.6 GHz AMD Opteron processors. The LUBM10k/1024 data set from that
paper has a slightly smaller size and a similar complexity like the LUBM10 data
set used for this paper. While the approach from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] took about 2 seconds in
total to calculate the closure using a cluster of 128 cores, our approach calculates
the RDFS closure on a single machine in about 3 seconds. On a MacBook Pro
with a Core i7 processor, which has a less powerful GPU but due to the CPU a
faster architecture for serial calculations, the same test could even be nished in
less than 2.6 seconds using the AMR reasoner. Nevertheless, the approach from
[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is able to handle much larger data sets.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Discussion and Future Work</title>
      <p>
        The results in section 4.3 show that our approach o ers a good and scaleable
performance using a single computer, also for data sets with millions of triples.
Nevertheless, there are still restrictions regarding the size of an ontology. On
the one hand for a performant execution the main memory of the host
computer needs to be large enough to hold the complete ontology as well as inferred
matches and data structures that are used for the rule execution. On the other
hand the use of integers for the created index structures limits the number of
processable triples. This limitation is even stronger regarding the matches-arrays
of the single nodes which easily needs to hold a multiple of elements as triples
are available. To overcome this issues, the use of 64bit datatypes as well as
appropriate collection types to hold the triples and matches should be considered.
In addition the use of collection oriented matching like describe in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] could
be considered, where matches are calculated and stored in a collection-oriented
way instead of using single tuples. Furthermore a partitioning strategy could
be implemented that allows to distribute the workload of large ontologies over
multiple GPUs as well as over multiple machines. Thus, a combination of the
cluster-based approach used in [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and the low level parallelisation like described
in this paper might be an interesting approach. Another optimisation might be
possible by parallelising the rule ring, too, which will require thread safe data
structures and a concept to detect duplicates.
      </p>
      <p>
        Besides optimisations regarding the performance and the ability to handle
larger data sets, in the future we are also going to investigate how we can
extend the functionality of our rule-based system to also support operants like
greaterT han(?x; ?y) within a rule body. This way our system would o er much
more exibility for scenarios with application speci c rules like used in di erent
kinds of smart environments like [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ].
6
      </p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>In the past most of the approaches to parallelise the reasoning process have
focused on distributing the workload over multiple machines to use a large number
of processors. Only a few approaches already considered the use of the parallel
structures available on a single machine. All approaches have in common, that
they implement a de ned set of rules and can not be con gured in an application
speci c way. In this paper we proposed a rule-based approach that is
independent from a speci c semantic and uses the parallel structures of modern CPUs
as well as of GPUs. The high performance is achieved by parallelising the Rete
algorithm and breaking the match-steps into ne grained tasks which can be
computed highly parallel. We also introduced a vector-based operation to
compute the beta matches, which easily doubles the performance of the algorithm
running on a GPU. Finally our results show, that the approach scales well with
the number of used cores and can apply a set of rules to an ontology in a very
performant way. Thus the parallelisation of a generic rule-based approach to
apply rules on ontological data can be very e cient, if the workload is partitioned
into an adequate number of units which can be computed highly parallel.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aus n</surname>
          </string-name>
          , D.,
          <string-name>
            <surname>Castanedo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lopez-de Ipin</surname>
          </string-name>
          ~a, D.:
          <article-title>Benchmarking results of semantic reasoners applied to an ambient assisted living environment</article-title>
          .
          <source>In: Proceedings of the 10th international smart homes and health telematics conference on Impact Ananlysis of Solutions for Chronic Disease Prevention and Management. ICOST'12</source>
          , Berlin, Heidelberg, Springer-Verlag (
          <year>2012</year>
          )
          <volume>282</volume>
          {
          <fpage>285</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Agostini</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bettini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Riboni</surname>
            ,
            <given-names>D.:</given-names>
          </string-name>
          <article-title>A performance evaluation of ontology-based context reasoning</article-title>
          .
          <source>In: Pervasive Computing and Communications Workshops</source>
          ,
          <year>2007</year>
          . PerCom Workshops '
          <fpage>07</fpage>
          . Fifth Annual IEEE International Conference on. (
          <year>2007</year>
          ) 3{
          <fpage>8</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pantsar-Syvaniemi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Simula</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ovaska</surname>
          </string-name>
          , E.:
          <article-title>Context-awareness in smart spaces</article-title>
          .
          <source>In: Computers and Communications (ISCC)</source>
          ,
          <source>2010 IEEE Symposium on. (</source>
          <year>2010</year>
          )
          <volume>1023</volume>
          {
          <fpage>1028</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Reinisch</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ko</surname>
            <given-names>er</given-names>
          </string-name>
          , M.,
          <string-name>
            <surname>Kastner</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Thinkhome: A smart home as digital ecosystem</article-title>
          .
          <source>In: Digital Ecosystems and Technologies (DEST)</source>
          ,
          <year>2010</year>
          4th IEEE International Conference on. (
          <year>2010</year>
          )
          <volume>256</volume>
          {
          <fpage>261</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Peters</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brink</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sachweh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zundorf, A.:
          <article-title>Performance considerations in ontology based ambient intelligence architectures</article-title>
          .
          <source>In: Proceedings of the 4th International Symposium on Ambient Intelligence</source>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Simanc k</surname>
          </string-name>
          , F.:
          <article-title>ELK: a reasoner for OWL EL ontologies</article-title>
          . System description, University of Oxford.
          <source>In: Technical Report</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. ter Horst, H.J.:
          <article-title>Completeness, decidability and complexity of entailment for RDF Schema and a semantic extension involving the OWL vocabulary</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          <volume>3</volume>
          (
          <issue>2</issue>
          -3) (
          <year>October 2005</year>
          )
          <volume>79</volume>
          {
          <fpage>115</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maassen</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
          </string-name>
          , H.:
          <article-title>Owl reasoning with webpie: calculating the closure of 100 billion triples</article-title>
          .
          <source>In: Proceedings of the 7th international conference on The Semantic Web: research and Applications - Volume Part I. ESWC'10</source>
          , Berlin, Heidelberg, Springer-Verlag (
          <year>2010</year>
          )
          <volume>213</volume>
          {
          <fpage>227</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Scalable distributed reasoning using mapreduce</article-title>
          .
          <source>In: Proceedings of the 8th International Semantic Web Conference. ISWC '09</source>
          , Berlin, Heidelberg, Springer-Verlag (
          <year>2009</year>
          )
          <volume>634</volume>
          {
          <fpage>649</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Forgy</surname>
            ,
            <given-names>C.L.</given-names>
          </string-name>
          :
          <article-title>Rete: a fast algorithm for the many pattern/many object pattern match problem</article-title>
          . In Raeth, P.G., ed.:
          <article-title>Expert systems</article-title>
          . IEEE Computer Society Press, Los Alamitos, CA, USA (
          <year>1990</year>
          )
          <volume>324</volume>
          {
          <fpage>341</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mutharaju</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Distributed reasoning with EL++ using mapreduce</article-title>
          .
          <source>Technical report</source>
          , Kno.e.sis Center, Wright State University, Dayton, Ohio (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Reasoning with large scale ontologies in fuzzy pd* using mapreduce</article-title>
          .
          <source>Computational Intelligence Magazine, IEEE</source>
          <volume>7</volume>
          (
          <issue>2</issue>
          ) (
          <year>2012</year>
          )
          <volume>54</volume>
          {
          <fpage>66</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anadiotis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siebes</surname>
          </string-name>
          , R., ten
          <string-name>
            <surname>Teije</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Marvin: A platform for large-scale analysis of semantic web data</article-title>
          .
          <source>In: Proceedings of the WebScience '09</source>
          ,
          <string-name>
            <surname>Society</surname>
          </string-name>
          On-Line (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>Parallel materialization of the nite RDFS closure for hundreds of millions of triples</article-title>
          . In Bernstein,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Heath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            ,
            <surname>Feigenbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Maynard</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Motta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Thirunarayan</surname>
          </string-name>
          , K., eds.:
          <source>The Semantic Web - ISWC 2009. Volume 5823 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2009</year>
          )
          <volume>682</volume>
          {
          <fpage>697</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kazakov</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Simanc k</surname>
          </string-name>
          , F.:
          <article-title>Concurrent classi cation of EL ontologies</article-title>
          .
          <source>In: Proceedings of the 10th international conference on The semantic web - Volume Part I. ISWC'11</source>
          , Berlin, Heidelberg, Springer-Verlag (
          <year>2011</year>
          )
          <volume>305</volume>
          {
          <fpage>320</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Ren</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>J.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Parallel abox reasoning of EL ontologies</article-title>
          .
          <source>In: Proceedings of the 2011 joint international conference on The Semantic Web. JIST'11</source>
          , Berlin, Heidelberg, Springer-Verlag (
          <year>2012</year>
          )
          <volume>17</volume>
          {
          <fpage>32</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Heino</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
          </string-name>
          , J.:
          <article-title>RDFS reasoning on massively parallel hardware</article-title>
          .
          <source>In: The Semantic Web ISWC 2012. Volume 7649 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2012</year>
          )
          <volume>133</volume>
          {
          <fpage>148</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Soma</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prasanna</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Parallel inferencing for OWL knowledge bases</article-title>
          .
          <source>In: Parallel Processing</source>
          ,
          <year>2008</year>
          . ICPP '
          <volume>08</volume>
          . 37th International Conference on. (
          <year>2008</year>
          )
          <volume>75</volume>
          {
          <fpage>82</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Muoz</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutierrez</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Minimal deductive systems for RDF</article-title>
          . In Franconi, E.,
          <string-name>
            <surname>Kifer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>May</surname>
          </string-name>
          , W., eds.:
          <source>The Semantic Web: Research and Applications. Volume 4519 of Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2007</year>
          )
          <volume>53</volume>
          {
          <fpage>67</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Nagypl</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Motik</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>A fuzzy model for representing uncertain, subjective, and vague temporal knowledge in ontologies</article-title>
          . In Meersman, R.,
          <string-name>
            <surname>Tari</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmidt</surname>
          </string-name>
          , D., eds.: On The Move to Meaningful
          <source>Internet Systems 2003: CoopIS, DOA, and ODBASE. Lecture Notes in Computer Science</source>
          . Springer Berlin Heidelberg (
          <year>2003</year>
          )
          <volume>906</volume>
          {
          <fpage>923</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Acharya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tambe</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Collection oriented match</article-title>
          .
          <source>In: Proceedings of the second international conference on Information and knowledge management</source>
          .
          <source>CIKM '93</source>
          (
          <year>1993</year>
          )
          <volume>516</volume>
          {
          <fpage>526</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>