<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Scalable Nonmonotonic Reasoning over RDF data using MapReduce</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ilias Tachmazidis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Grigoris Antoniou</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giorgos Flouris</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Spyros Kotoulas</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science, University of Crete</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>IBM Research</institution>
          ,
          <addr-line>IBM</addr-line>
          <country country="IE">Ireland</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Computer Science</institution>
          ,
          <addr-line>FORTH</addr-line>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>University of Huddersfield</institution>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <fpage>75</fpage>
      <lpage>90</lpage>
      <abstract>
        <p>In this paper, we are presenting a scalable method for nonmonotonic rule-based reasoning over Semantic Web Data, using MapReduce. Our work is motivated by the recent unparalleled explosion of available data coming from the Web, sensor readings, databases, ontologies and more. Such datasets could benefit from the introduction of rule sets encoding commonly accepted rules or facts, application- or domain-specific rules, commonsense knowledge etc. This raises the question of whether, how, and to what extent knowledge representation methods are capable of handling huge amounts of data for these applications. We present a scalable MapReduce-based method for reasoning using defeasible stratified logics. Our results indicate that our method shows good scalability properties and is able to handle a benchmark dataset of 1 billion triples, bringing it on par with state-of-the-art methods for monotonic logics.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Recently, we experience a significant growth of the amount of available data published
on the Semantic Web. Billions of RDF triples from Wikipedia, U.S. Census, CIA World
Factbook, open government sites in the US and the UK, memory organizations like the
British Museum and Europeanna, as well as news and entertainment sources such as
BBC, are published, along with numerous vocabularies and conceptual schemas from
e-science aiming to facilitate annotation and interlinking of scientific and scholarly
data [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The recent rising of the Linked Open Data initiative5 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is an answer to
the need for such large and interconnected data. RDF(S) [
        <xref ref-type="bibr" rid="ref18 ref7">18, 7</xref>
        ] has become the de
facto standard for representing such knowledge in the Semantic Web, due to its flexible
and extensible representation of information, which is independent of the existence or
absence of a schema, under the form of triples.
      </p>
      <p>The amount, diversity and interlinkage of data published in this manner enables a
new generation of decision making and business intelligence applications across
domains. To fully exploit the immense value of such datasets and their interconnections,
one should be able to reason over them using rule sets that allow the aggregation,
visualization, understanding and exploitation of the raw data. Such reasoning is based on
rules which capture the RDFS or OWL inference semantics, but also rules which encode
commonsense, domain-specific, or other practical knowledge that humans possess and
would allow the system to automatically reach useful conclusions based on the provided
data, i.e., infer new and useful knowledge based on the data and their interconnections.</p>
      <p>
        The knowledge representation field has provided a rich set of semantics and
techniques to use for reasoning using such rule sets, although the focus has been on
complex knowledge structures and reasoning methods. On the other hand, RDF datasets
are much simpler, but their size raises scalability challenges that cannot be addressed
by standard approaches. For example, as described in [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], for 78,8 million statements
crawled from the Web (a small percentage of the available knowledge), the number of
inferred conclusions using the relatively simple RDFS ruleset consists of 1,5 billion
triples; it is evident that coping with such amounts of data is impossible in standard,
single-machine approaches due to both memory and performance issues.
      </p>
      <p>
        To address this problem, the use of massive parallelism has been recently
proposed [
        <xref ref-type="bibr" rid="ref10 ref14 ref22 ref28 ref29 ref30">28, 22, 29, 14, 10, 30</xref>
        ], where reasoning is handled by a set of machines,
assigning each of them a part of the parallel computation. In some cases, this approach has
allowed scaling reasoning up to 100 billion triples [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. However, such approaches have
focused on monotonic reasoning, or have not been evaluated in terms of scalability [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        In this paper, we concentrate on nonmonotonic rule sets [
        <xref ref-type="bibr" rid="ref17 ref2">2, 17</xref>
        ]. Such rule sets
provide additional benefits because they are more suitable for encoding commonsense
knowledge and reasoning. In addition, in the case of poor quality data, monotonic
logics such as RDFS cause an explosion of trivial (and often useless derivations), as also
identified in [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The occurrence of low quality data is very common in the context
of the Semantic Web [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], as data are fetched from different sources, which are not
controlled by the data engineer; thus, nonmonotonic reasoning is more suitable for this
context.
      </p>
      <p>
        Our previous works [
        <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
        ] described how defeasible logic reasoning [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], a
commonly used nonmonotonic logic, can be implemented using massively parallel
techniques. In [
        <xref ref-type="bibr" rid="ref26 ref27">26, 27</xref>
        ] we adopted the MapReduce framework [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], which is widely used
for parallel processing of huge datasets. In particular, we used Hadoop, an open-source
implementation of the MapReduce framework, with an extensive user list including
companies like IBM, Yahoo!, Facebook and Twitter6.
      </p>
      <p>
        The approach of [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] addressed reasoning for stratified rule sets. Stratification is a
well-known concept employed in many areas of knowledge representation for efficiency
reasons, e.g., in tractable RDF query answering [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], Description Logics [
        <xref ref-type="bibr" rid="ref11 ref20 ref4">4, 11, 20</xref>
        ]
and nonmonotonic formalisms [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], as it has been shown to reduce the computational
complexity of various reasoning problems.
      </p>
      <p>
        This paper is the first attempt evaluating the feasibility of applying nonmonotonic
reasoning over RDF data using mass parallelization techniques. We present a
technique for materialization using stratified defeasible logics, based on MapReduce and
focussing on performance. A defeasible rule set for the LUBM7 benchmark is presented,
which is used to evaluate our approach. We present scalability results indicating that
our approach scales superlinearly with the data size. In addition, since load-balancing
is a significant performance inhibitor in reasoning systems [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], we show that our
ap6 http://wiki.apache.org/hadoop/PoweredBy
7 http://swat.cse.lehigh.edu/projects/lubm/
Algorithm 1 Wordcount example
map(Long key, String value) :
// key: position in document
// value: document line
for each word w in value
      </p>
      <p>EmitIntermediate(w, “1”);
reduce(String key, Iterator values) :
// key: a word
// values : list of counts
int count = 0;
for each v in values</p>
      <p>count += ParseInt(v);</p>
      <p>
        Emit(key , count);
proach performs very well in this respect for the considered dataset, distributing data
fairly uniformly across MapReduce tasks. Compared to our previous work with
similar content, we extend [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] by considering multi-arity predicates, and improve [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] by
experimenting over a standard RDF data benchmark (LUBM).
      </p>
      <p>The rest of the paper is organized as follows. Section 2 introduces briefly the
MapReduce Framework and Defeasible Logic. The algorithm for defeasible reasoning using
MapReduce is described in Section 3, while Section 4 presents our experimental results.
We conclude in Section 5.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Preliminaries</title>
      <sec id="sec-2-1">
        <title>MapReduce</title>
        <p>
          MapReduce is a framework for parallel processing over huge datasets [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Processing
is carried out in two phases, a map and a reduce phase. For each phase, a set of
userdefined map and reduce functions are run in a parallel fashion. The former performs a
user-defined operation over an arbitrary part of the input and partitions the data, while
the latter performs a user-defined operation on each partition.
        </p>
        <p>MapReduce is designed to operate over key/value pairs. Specifically, each M ap
function receives a key/value pair and emits a set of key/value pairs. All key/value pairs
produced during the map phase are grouped by their key and passed to the reduce phase.
During the reduce phase, a Reduce function is called for each unique key, processing
the corresponding set of values.</p>
        <p>Probably the most well-known MapReduce example is the wordcount example. In
this example, we take as input a large number of documents and the final result is the
calculation of the number of occurrences of each word. The pseudo-code for the M ap
and Reduce functions is depicted in Algorithm 1.</p>
        <p>During the map phase, each map operation gets as input a line of a document. The
M ap function extracts words from each line and emits that word w occurred once (“1”).
Here we do not use the position of each line in the document, thus the key in M ap is
ignored. However, a word can be found more than once in a line. In this case we emit a
&lt;w, 1&gt; pair for each occurrence. Consider the line “Hello world. Hello MapReduce.”.
Instead of emitting a pair &lt;Hello, 2&gt;, our simple example emits &lt;Hello, 1&gt; twice
(pairs for words world and MapReduce are emitted as well). As mentioned above, the
MapReduce framework will group and sort pairs by their key. Specifically for the word
Hello, a pair &lt;Hello, &lt;1,1&gt;&gt; will be passed to the Reduce function. The Reduce
function has to sum up all occurrence values for each word emitting a pair containing
the word and the final number of occurrences. The final result for the word Hello will
be &lt;Hello, 2&gt;.
2.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Defeasible Logic - Syntax</title>
        <p>
          A defeasible theory [
          <xref ref-type="bibr" rid="ref21">21</xref>
          ], [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] (a knowledge base in defeasible logic) consists of five
different kinds of knowledge: facts, strict rules, defeasible rules, defeaters, and a
superiority relation.
        </p>
        <p>Facts are literals that are treated as known knowledge (given or observed facts).</p>
        <p>Strict rules are rules in the classical sense: whenever the premises are indisputable
(e.g., facts) then so is the conclusion. An example of a strict rule is “Emus are birds”,
which can be written formally as: “emu(X) → bird(X).”.</p>
        <p>Defeasible rules are rules that can be defeated by contrary evidence. An example of
such a rule is “Birds typically fly”; written formally: “bird(X) ⇒ flies(X).”.</p>
        <p>Defeaters are rules that cannot be used to draw any conclusions. Their only use is
to prevent some conclusions. An example is “If an animal is heavy then it might not be
able to fly”. Formally: “heavy(X) ❀ ¬flies(X).”.</p>
        <p>The superiority relation among rules is used to define priorities among rules, that
is, where one rule may override the conclusion of another rule. For example, given the
defeasible rules “r : bird(X) ⇒ flies(X)” and “r : brokenWing(X) ⇒ ¬flies(X)” which
contradict one another, no conclusive decision can be made about whether a bird with
broken wings can fly. But if we introduce a superiority relation &gt; with r &gt; r, with the
intended meaning that r is strictly stronger than r, then we can indeed conclude that the
bird cannot fly.</p>
        <p>Note that in this paper, the aforementioned term literal is defined strictly by the
defeasible logic semantics. An RDF triple can be represented as a literal. However,
considering the term literal as an RDF literal would be a common misunderstanding.
2.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Defeasible Logic - Formal Definition</title>
        <p>A rule r consists (a) of its antecedent (or body) A(r) which is a finite set of literals, (b)
an arrow, and, (c) its consequent (or head) C(r) which is a literal. Given a set R of rules,
we denote the set of all strict rules in R by Rs, and the set of strict and defeasible rules
in R by Rsd. R[q] denotes the set of rules in R with consequent q. If q is a literal, ∼q
denotes the complementary literal (if q is a positive literal p then ∼q is ¬p; and if q is
¬p, then ∼q is p)</p>
        <p>A defeasible theory D is a triple (F,R,&gt;) where F is a finite set of facts, R a finite
set of rules, and &gt; a superiority relation upon R.
2.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Defeasible Logic - Proof Theory</title>
        <p>A conclusion of D is a tagged literal and can have one of the following four forms:
– +∆q, meaning that q is definitely provable in D.
– −∆q, meaning that q is not definitely provable in D (this does not necessarily mean
that ∼q is definitely provable).
– +∂q, meaning that q is defeasibly provable in D.
– −∂q, meaning that q is not defeasibly provable in D (this does not necessarily mean
that ∼q is defeasibly provable).</p>
        <p>
          Provability is defined below. It is based on the concept of a derivation (or proof) in
D = (F, R, &gt;). A derivation is a finite sequence P = P(1), ..., P(n) of tagged literals
satisfying the following conditions. The conditions are essentially inference rules phrased
as conditions on proofs. P(1..ı) denotes the initial part of the sequence P of length i. For
more details on provability and an explanation of the intuition behind the conditions
below, see [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
+∆: We may append P(ı
q ∈ F or
∃r ∈ Rs[q] ∀α ∈ A(r): +∆α
−∆: We may append P(ı
q ∈/ F and
∀r ∈ Rs[q] ∃α ∈ A(r): −∆α
+ 1) = −∆q if
∈ P(1..ı)
∈ P(1..ı)
+ 1) = +∆q if either
+∂: We may append P (ı + 1) = +∂q if either
(1) +∆q ∈ P(1..ı) or
(2) (2.1) ∃r ∈ Rsd[q] ∀α ∈ A(r): +∂α ∈ P(1..ı) and
(2.2) −∆ ∼q ∈ P(1..ı) and
(2.3) ∀s ∈ R[∼q] either
(2.3.1) ∃α ∈ A(s): −∂α ∈ P(1..ı) or
(2.3.2) ∃t ∈ Rsd[q] such that
        </p>
        <p>∀α ∈ A(t): +∂α ∈ P(1..ı) and t &gt; s
−∂: We may append P(ı + 1) = −∂q if
(1) −∆q ∈ P(1..ı) and
(2) (2.1) ∀r ∈ Rsd[q] ∃α ∈ A(r): −∂α ∈ P(1..ı) or
(2.2) +∆ ∼q ∈ P(1..ı) or
(2.3) ∃s ∈ R[∼q] such that
(2.3.1) ∀α ∈ A(s): +∂α ∈ P(1..ı) and
(2.3.2) ∀t ∈ Rsd[q] either</p>
        <p>∃α ∈ A(t): −∂α ∈ P(1..ı) or t ≯ s
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Algorithm description</title>
      <p>
        The algorithm that is described in this section, shows how parallel reasoning can be
performed using the MapReduce framework. Parallel reasoning can be based either on
rule partitioning or on data partitioning [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. Rule partitioning assigns the computation
of each rule to a computer in the cluster. However, balanced work distribution in this
case is difficult to achieve, as the computational burden per rule (and node) depends
on the structure of the rule set. On the other hand, data partitioning assigns a subset of
data to each computer. Data partitioning is more flexible, providing more fine-grained
partitioning and allowing easier distribution among nodes in a balanced manner. Our
approach is based on data partitioning.
      </p>
      <p>For reasons that will be explained later, defeasible reasoning over rule sets with
multi-argument predicates is based on the dependencies between predicates which is
encoded using the predicate dependency graph. Thus, rule sets can be divided into two
categories: stratified and non-stratified. Intuitively, a stratified rule set can be represented
as an acyclic hierarchy of dependencies between predicates, while a non-stratified
cannot. We address the problem for stratified rule sets by providing a well-defined
reasoning sequence, and explain at the end of the section the challenges for non-stratified rule
sets.</p>
      <p>The dependencies between predicates can be represented using a predicate
dependency graph. For a given rule set, the predicate dependency graph is a directed graph
whose:
– vertices correspond to predicates. For each literal p, both p and ¬p are represented
by the positive predicate.
– edges are directed from a predicate that belongs to the body of a rule, to a predicate
that belongs to the head of the same rule. Edges are used for all three rule types
(strict rules, defeasible rules, defeaters).</p>
      <p>Stratified rule sets (correspondingly, non-stratified rule sets) are rule sets whose
predicate dependency graph is acyclic (correspondingly, contains a cycle). Stratified
theories are theories based on stratified rule sets. Figure 1a depicts the predicate
dependency graph of a stratified rule set, while Figure 1b depicts the predicate dependency
graph of a non-stratified rule set. The superiority relation is not part of the graph.</p>
      <p>As an example of a stratified rule set, consider the following:
r1: X sentApplication A, A completeFor D ⇒ X acceptedBy D.
r2: X hasCertificate C, C notValidFor D ⇒ X ¬acceptedBy D.
r3: X acceptedBy D, D subOrganizationOf U ⇒ X studentOfUniversity U.
r1 &gt; r2.</p>
      <p>The predicate dependency graph for the above rule set is depicted in Figure 1a. The
predicate graph can be used to determine strata for the different predicates. In particular,
predicates (nodes) with no outgoing edges are assigned the maximum stratum, which is
equal to the maximum depth of the directed acyclic graph (i.e., the size of the maximum
path that can be defined through its edges), say k. Then, all predicates that are connected
with a predicate of stratum k are assigned stratum k − 1, and the process continues
recursively until all predicates have been assigned some stratum. Note that predicates are
reassigned to a lower stratum in case of multiple dependencies. The dashed horizontal
lines in Figure 1a are used to separate the various strata, which, in our example, are as
follows:</p>
      <sec id="sec-3-1">
        <title>Stratum 2: studentOfUniversity Stratum 1: acceptedBy, subOrganizationOf Stratum 0: sentApplication, completeFor, hasCertificate, notValidFor</title>
        <p>
          Stratified theories are often called decisive in the literature [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Proposition 1. [5] If D is stratified, then for each literal p:</title>
        <p>(a) either D +∆p or D −∆p
(b) either D +∂p or D −∂p</p>
        <p>Thus, there are three possible states for each literal p in a stratified theory: (a) +∆p
and +∂p, (b) −∆p and +∂p and (c) −∆p and −∂p.</p>
        <p>Reasoning is based on facts. According to defeasible logic algorithm, facts are +∆
and every literal that is +∆, is +∂ too. Having +∆ and +∂ in our initial knowledge
base, it is convenient to store and perform reasoning only for +∆ and +∂ predicates.</p>
        <p>This representation of knowledge allows us to reason and store provability
information regarding various facts more efficiently. In particular, if a literal is not found as a
+∆ (correspondingly, +∂) then it is −∆ (correspondingly, −∂). In addition, stratified
defeasible theories have the property that if we have computed all the +∆ and +∂
conclusions up to a certain stratum, and a rule whose body contains facts of said stratum
does not currently fire, then this rule will also be inapplicable in subsequent passes; this
provides a well-defined reasoning sequence, namely considering rules from lower to
higher strata.
3.1</p>
        <sec id="sec-3-2-1">
          <title>Reasoning overview</title>
          <p>During reasoning we will use the representation (&lt;fact, (+∆,+ ∂)&gt;) to store our
inferred facts. We begin by transforming the given facts, in a single MapReduce pass,
into (&lt;fact, (+∆,+ ∂)&gt;).</p>
          <p>Now let us consider for example the facts “John sentApplication App”, “App
completeFor Dep”, “John hasCertificate Cert”, “Cert notValidFor Dep” and “Dep
subOrganizationOf Univ” . The initial pass on these facts using the aforementioned rule set
will create the following output:
&lt;John sentApplication App, (+∆,+ ∂)&gt;
&lt;John hasCertificate Cert, (+∆,+ ∂)&gt;
&lt;Dep subOrganizationOf Univ, (+∆,+ ∂)&gt;
&lt;App completeFor Dep, (+∆,+ ∂)&gt;
&lt;Cert notValidFor Dep, (+∆,+ ∂)&gt;</p>
          <p>No reasoning needs to be performed for the lowest stratum (stratum 0) since these
predicates (sentApplication, completeFor, hasCertificate, notValidFor) do not belong
to the head of any rule. As is obvious by the definition of +∂, −∂, defeasible logic
introduces uncertainty regarding inference, because certain facts/rules may “block” the
firing of other rules. This can be prevented if we reason for each stratum separately,
starting from the lowest stratum and continuing to higher strata. This is the reason why
for a hierarchy of N strata we have to perform N − 1 times the procedure described
below. In order to perform defeasible reasoning we have to run two passes for each
stratum. The first pass computes which rules can fire. The second pass performs the
actual reasoning and computes for each literal if it is definitely or defeasibly provable.
The reasons for both decisions (reasoning sequence and two passes per stratum) are
explained in the end of the next subsection.
3.2</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Pass #1: Fired rules calculation</title>
          <p>
            During the first pass, we calculate the inference of fired rules, which is performed by
joining predicates on common argument values. Such techniques for basic and
multiway join have been described in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and optimized in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. In order to achieve an efficient
implementation, optimizations in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ] should be taken into consideration. Here we
elaborate on our approach for basic joins and explain at the end of the subsection how it can
be generalized for multi-way joins.
          </p>
          <p>Basic join is performed on common argument values. Consider the following rule:
r1: X sentApplication A, A completeFor D ⇒ X acceptedBy D.</p>
          <p>The key observation is that “X sentApplication A” and “A completeFor D” can be
joined on their common argument A. Based on this observation, during the M ap
operation, we emit pairs of the form &lt;A, (X, sentApplication)&gt; for predicate sentApplication
and &lt;A, (D, completeFor)&gt; for predicate completeFor. The idea is to join
sentApplication and completeFor only for literals that have the same value on argument A. During
the Reduce operation we combine sentApplication and completeFor producing
acceptedBy.</p>
          <p>In our example, the facts “John sentApplication App” and “App completeFor Dep”
will cause M ap to emit &lt;App, (John, sentApplication)&gt; and &lt;App, (Dep,
completeFor)&gt;. The MapReduce framework groups and sorts intermediate pairs passing &lt;App,
&lt;(John, sentApplication), (Dep, completeFor)&gt;&gt; to the Reduce operation. Finally, at
Reduce we combine given values and infer “John acceptedBy Dep”.</p>
          <p>To support defeasible logic rules which have blocking rules, this approach must
be extended. We must record all fired rules prior to any conclusion inference, whereas
for monotonic logics this is not necessary, and conclusion derivation can be performed
immediately. The reason why this is so is explained at the end of the subsection.
Pseudocode for M ap and Reduce functions, for a basic join, is depicted in Algorithm 2. M ap
function reads input of the form &lt;literal, (+∆, +∂)&gt; or &lt;literal, (+∂)&gt; and emits
pairs of the form &lt;matchingArgumentValue, (nonMatchingArgumentValue, Predicate,
+∆, +∂)&gt; or &lt;matchingArgumentValue, (nonMatchingArgumentValue, Predicate,
+∂)&gt; respectively.
Algorithm 2 Fired rules calculation
map(Long key, String value):
// key: position in document (irrelevant)
// value: document line (derived conclusion)
For every common argumentValue in value</p>
          <p>EmitIntermediate(argumentValue, value);
reduce(String key, Iterator values):
// key: matching argument
// value: literals for matching
For every argument value match in values</p>
          <p>If strict rule fired with all premises being +∆ then</p>
          <p>Emit(firedLiteral, “[¬,] +∆, + ∂, ruleID”);
else</p>
          <p>Emit(firedLiteral, “[¬,] +∂, ruleID”);</p>
          <p>Now consider again the stratified rule set described in the beginning of the section,
for which the initial pass will produce the following output:
&lt;John sentApplication App, (+∆,+ ∂)&gt;
&lt;John hasCertificate Cert, (+∆,+ ∂)&gt;
&lt;Dep subOrganizationOf Univ, (+∆,+
∂)&gt;
&lt;App completeFor Dep, (+∆,+
&lt;Cert notValidFor Dep, (+∆,+
∂)&gt;
∂)&gt;</p>
          <p>We perform reasoning for stratum 1, so we will use as premises all the available
information for predicates of stratum 0. The M ap function will emit the following
pairs:
&lt;App, (John,sentApplication,+∆,+
&lt;Cert, (John,hasCertificate,+∆,+</p>
          <p>∂)&gt; &lt;App, (Dep,completeFor,+∆,+
∂)&gt; &lt;Cert, (Dep,notValidFor,+∆,+
∂)&gt;
∂)&gt;</p>
          <p>The MapReduce framework will perform grouping/sorting resulting in the
following intermediate pairs:
&lt;App, &lt;(John,sentApplication,+∆,+
&lt;Cert, &lt;(John,hasCertificate,+∆,+</p>
          <p>∂), (Dep,completeFor,+∆,+
∂), (Dep,notValidFor,+∆,+</p>
          <p>∂)&gt;&gt;
∂)&gt;&gt;</p>
          <p>During reduce we combine premises in order to emit the firedLiteral which consists
of the fired rule head predicate and the nonMatchingArgumentValue of the premises.
However, inference depends on the type of the rule. In general, for all three rule types
(strict rules, defeasible rules and defeaters) if a rule fires then we emit as output &lt;firedLiteral,
([¬,] +∂, ruleID)&gt; ([¬,] denotes that “¬” is optional and appended only if the
firedLiteral is negative). However, there is a special case for strict rules. This special case covers
the required information for +∆ conclusions inference. If all premises are +∆ then we
emit as output &lt;firedLiteral, ([¬,]+∆,+ ∂,ruleID)&gt; instead of &lt;firedLiteral, ([¬,]+∂,
ruleID)&gt;.</p>
          <p>For example, during the reduce phase the reducer with key:
App will emit &lt;John acceptedBy Dep, (+∂, r1)&gt;
Cert will emit &lt;John acceptedBy Dep, (¬,+∂, r2)&gt;</p>
          <p>As we see here, “John acceptedBy Dep” and “John ¬acceptedBy Dep” are computed
by different reducers (with key App and Cert respectively) which do not communicate
with each other. Thus, none of the two reducers has all the available information in order
to perform defeasible reasoning. Therefore, we need a second pass for the reasoning.</p>
          <p>Let us illustrate why reasoning has to be performed for each stratum separately,
requiring stratified rule sets. Consider again our running example. We will attempt to
perform reasoning for all the strata simultaneously. On the one hand, we cannot join
“John acceptedBy Dep” with “Dep subOrganizationOf Univ” prior to the second pass
because we do not have a final conclusion on “John acceptedBy Dep”. Thus, we will
not perform reasoning for “John studentOfUniversity Univ” during the second pass,
which leads to data loss. On the other hand, if another rule (say r4) supporting “John
¬studentOfUniversity Univ” had also fired, then during the second pass, we would have
mistakenly inferred “John ¬studentOfUniversity Univ”, leading our knowledge base to
inconsistency.</p>
          <p>
            In case of multi-way joins we compute the head of the rule (firedLiteral) by
performing joins, on common argument values, in one or more MapReduce passes as explained
in [
            <xref ref-type="bibr" rid="ref9">9</xref>
            ] and [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ]. As above, for each fired rule, we must take into consideration the type of
the rule and whether all the premises are +∆ or not. Finally, the format of the output
remains the same (&lt;firedLiteral, ([¬,] +∆, +∂, ruleID)&gt; or &lt;firedLiteral, ([¬,] +∂,
ruleID)&gt;).
3.3
          </p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Pass #2: Defeasible reasoning</title>
          <p>We proceed with the second pass. Once fired rules are calculated, a second MapReduce
pass performs reasoning for each literal separately. We should take into consideration
that each literal being processed could already exist in our knowledge base (due to the
initial pass). In this case, we perform duplicate elimination by not emitting pairs for
existing conclusions. The pseudo-code for M ap and Reduce functions, for stratified
rule sets, is depicted in Algorithm 3.</p>
          <p>After both initial pass and fired rules calculation (first pass), our knowledge base
will consist of:
&lt;John sentApplication App, (+∆,+ ∂)&gt; &lt;App completeFor Dep, (+∆,+ ∂)&gt;
&lt;John hasCertificate Cert, (+∆,+ ∂)&gt; &lt;Cert notValidFor Dep, (+∆,+ ∂)&gt;
&lt;Dep subOrganizationOf Univ, (+∆,+ ∂)&gt; &lt;John acceptedBy Dep, (+∂, r1)&gt;
&lt;John acceptedBy Dep, (¬,+∂, r2)&gt;</p>
          <p>During the M ap operation we must first extract from value the literal and the
inferred knowledge or the fired rule using extractLiteral() and extractKnowledge()
respectively. For each literal p, both p and ¬p are sent to the same reducer. The “¬” in
knowledge distinguishes p from ¬p. The M ap function will emit the following pairs:
&lt;Dep subOrganizationOf Univ, (+∆,+ ∂)&gt; &lt;John acceptedBy Dep, (+∂, r1)&gt;
&lt;John acceptedBy Dep, (¬,+∂, r2)&gt;
Algorithm 3 Defeasible reasoning
map(Long key, String value) :
// key: position in document (irrelevant)
// value: inferred knowledge/fired rules
String p = extractLiteral(value);
If p does not belong to current stratum then</p>
          <p>return;
String knowledge = extractKnowledge(value);</p>
          <p>EmitIntermediate(p, knowledge);
reduce(String p, Iterator values) :
// p: a literal
// values : inferred knowledge/fired rules
For each value in values</p>
          <p>markKnowledge(value);
For literal in {p, ¬p} check</p>
          <p>If literal is already +∆ then</p>
          <p>return;
Else If strict rule fired with all premises being +∆ then</p>
          <p>Emit(literal, “+∆ , +∂”);
Else If literal is +∂ after defeasible reasoning then</p>
          <p>Emit(literal, “+∂”);</p>
          <p>MapReduce framework will perform grouping/sorting resulting in the following
intermediate pairs:
&lt;Dep subOrganizationOf Univ, (+∆,+ ∂)&gt;
&lt;John acceptedBy Dep, &lt;(+∂, r1), (¬,+∂, r2)&gt;&gt;</p>
          <p>For the Reduce, the key contains the literal and the values contain all the available
information for that literal (known knowledge, fired rules). We traverse over values
marking known knowledge and fired rules using the markKnowledge() function.
Subsequently, we use this information in order to perform reasoning for each literal.</p>
          <p>During the reduce phase the reducer with key:
“Dep subOrganizationOf Univ” will not emit anything
“John acceptedBy Dep” will emit &lt;John acceptedBy Dep, (+∂)&gt;
The literal “Dep subOrganizationOf Univ” is known knowledge. For known knowledge
a potential duplicate elimination must be performed. We reason simultaneously both for
“John acceptedBy Dep” and “John ¬acceptedBy Dep”. As “John ¬acceptedBy Dep” is
−∂, it does not need to be recorded.
3.4</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>Final remarks</title>
          <p>The total number of MapReduce passes is independent of the size of the given data,
and is determined by the form of the rules, in particular by the number of strata that the
rules are stratified into. As mentioned in subsection 3.2, performing reasoning for each
stratum separately eliminates data loss and inconsistency, thus our approach is sound
and complete since we fully comply with the defeasible logic provability. Eventually,
our knowledge base consists of +∆ and +∂ literals.</p>
          <p>
            The situation for non-stratified rule sets is more complex. Reasoning can be based
on the algorithm described in [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ], performing reasoning until no new conclusion is
derived. However, the total number of required passes is generally unpredictable,
depending both on the given rule set and the data distribution. Additionally, an efficient
mechanism for “∀r ∈ Rs[q] ∃α ∈ A(r): −∆α ∈ P(1..ı)” (in −∆ provability) and
“∀r ∈ Rsd[q] ∃α ∈ A(r): −∂α ∈ P(1..ı)” (in 2.1 of −∂ provability) computation is
yet to be defined because all the available information for the literal must be processed
by a single node (since nodes do not communicate with each other), causing either main
memory insufficiency or skewed load balancing decreasing the parallelization. Finally,
we have to reason for and store every possible conclusion (+∆,− ∆,+ ∂,−∂), producing
a significantly larger stored knowledge base.
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Evaluation</title>
      <p>In this Section, we are presenting the methodology, dataset and experimental results for
an implementation of our approach using Hadoop.</p>
      <p>Methodology. Our evaluation is centered around scalability and the capacity of
our system to handle large datasets. In line with standard practice in the field of
highperformance systems, we have defined scalability as the ability to process datasets of
increasing size in a proportional amount of time and the ability of our system to
perform well as the computational resources increase. With regard to the former, we have
performed experiments using datasets of various sizes (yet similar characteristics).</p>
      <p>
        With regard to scaling computational resources, it has been empirically observed
that the main inhibitor of parallel reasoning systems has been load-balancing between
compute nodes [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Thus, we have also focused our scalability evaluation on this
aspect.
      </p>
      <p>The communication model of Hadoop is not sensitive to the physical location of
each data partition. In our experiments, Map tasks only use local data (implying very
low communication costs) and Reduce operates using hash-partitioning to distribute
data across the cluster (resulting in very high communication costs regardless of the
distribution of data and cluster size). In this light, scalability problems do not arise
by the number of compute nodes, but by the unequal distribution of the workload in
each reduce task. As the number of compute nodes increases, this unequal distribution
becomes visible and hampers performance.</p>
      <p>Dataset. We have used the most popular benchmark for reasoning systems, LUBM.
LUBM allows us to scale the size of the data to an arbitrary size while keeping the
reasoning complexity constant. For our experiments, we generated up to 8000 universities
resulting in approximately 1 billion triples.</p>
      <p>Rule set. The logic of LUBM can be partially expressed using RDFS and
OWL2RL. Nevertheless, neither of these logics are defeasible. Thus, to evaluate our system,
we have created the following ruleset:
r1: X rdf:type FullProfessor → X rdf:type Professor.
r2: X rdf:type AssociateProfessor → X rdf:type Professor.
r3: X rdf:type AssistantProfessor → X rdf:type Professor.
r4: P publicationAuthor X, P publicationAuthor Y → X commonPublication Y.
r5: X teacherOf C, Y takesCourse C → X teaches Y.
r6: X teachingAssistantOf C, Y takesCourse C → X teaches Y.
r7: X commonPublication Y → X commonResearchInterests Y.
r8: X hasAdvisor Z, Y hasAdvisor Z → X commonResearchInterests Y.
r9: X hasResearchInterest Z, Y hasResearchInterest Z → X commonResearchInterests Y.
r10: X hasAdvisor Y ⇒ X canRequestRecommendationLetter Y.
r11: Y teaches X ⇒ X canRequestRecommendationLetter Y.
r12: Y teaches X, Y rdf:type PostgraduateStudent ⇒ X ¬canRequestRecommendationLetter Y.
r13: X rdf:type Professor, X worksFor D, D subOrganizationOf U ⇒ X canBecomeDean U.
r14: X rdf:type Professor, X headOf D, D subOrganizationOf U ⇒ X ¬canBecomeDean U.
r15: X worksFor D ⇒ X canBecomeHeadOf D.
r16: X worksFor D, Z headOf D, X commonResearchInterests Z ⇒ X ¬canBecomeHeadOf D.
r17: Y teaches X ⇒ X suggestAdvisor Y.
r18: Y teaches X, X hasAdvisor Z ❀ X ¬suggestAdvisor Y.
r12 &gt; r11, r14 &gt; r13, r16 &gt; r15, r18 &gt; r17.</p>
      <p>MapReduce jobs description. We need 8 jobs in order to perform reasoning on
the above rule set. The first job is the initial pass described in Section 3 (which we
also use to compute rules r1-r3). For the rest of the jobs, we first compute fired rules
and then perform reasoning for each stratum separately. The second job computes rules
r4-r6. During the third job we perform duplicate elimination, since r4-r6 are strict rules.
We compute rules r7-r14 during the fourth job while reasoning on them, is performed
during the fifth job. Jobs six and seven compute rules r15-r18. Finally, during the eighth
job we perform reasoning on r15-r18, finishing the whole procedure.</p>
      <p>
        Platform. Our experiments were performed on a IBM x3850 server with 40 cores
and 750GB of RAM, connected to a XIV Storage Area Network (SAN), using a 10Gbps
storage switch. We have used IBM Hadoop Cluster v1.3, which is compatible with
Hadoop v0.20.2, along with an optimization to reduce Map task overhead, in line with
[
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. Although our experiments were run on a single machine, there was no direct
communication between processes and all data was transferred through persistent storage.
We have used a number of Mappers and Reducers equal to the number of cores in the
system (i.e. 40).
      </p>
      <p>Results. Figure 2 shows the runtimes of our system for varying input sizes. We make
the following observations: (a) even for a single node, our system is able to handle very
large datasets, easily scaling to 1 billion triples. (b) The scaling properties with regard
to dataset size are excellent: in fact, as the size of the input increases, the throughput
of our system increases. For example, while our system can process a dataset of 125
million triples at a throughput of 27Ktps, for 1 billion triples, the throughput becomes
63Ktps. This is attributed to the fact that job startup costs are amortized over the longer
runtime of the bigger datasets.</p>
      <p>The above show that our system is indeed capable of achieving high performance
and scales very well with the size of the input. Nevertheless, to further investigate how
120
100
s 80
ond
iscen 60
e
m
iT 40
20
0
200
400Mil ions of facts
600
800
1000
our system would perform when the data size precludes the use of a single machine, it
is critical to examine the load-balancing properties of our algorithm.</p>
      <p>As previously described, in typical MapReduce applications, load-balancing
problems arise during the reduce phase. Namely, it is possible that the partitions of the data
processed in a single reduce task vary widely in terms of compute time required. This is
a potential scalability bottleneck. To test our system for such issues, we have launched
an experiment where we have increased the number of reduce tasks to 400. We can
expect that, if the load balance for 400 reduce tasks is relatively uniform, our system is
able to scale at least to that size.</p>
      <p>Figure 3 shows the load balance between different reduce tasks, for 1 billion triples
and 40 (Figure 3a) or 400 (Figure 3b) reduce tasks. In principle, an application performs
badly when a single task dominates the runtime, since all other tasks would need to wait
for it to finish. In our experiments, it is evident that no such task exists. In addition, one
may note that the system is actually faster with 400 reduce tasks. This is attributed both
to the fact that each core in our platform can process two threads in parallel, and to
implementation aspects of Hadoop that result in tasks, processing approximately 1GB,
demonstrating higher throughput than larger tasks.</p>
      <p>
        Although a direct comparison is not meaningful, the throughput of our system is in
line with results obtained when doing monotonic reasoning using state of the art RDF
stores and inference engine. For example, OWLIM claims a 14.4-hour loading time
for the same dataset when doing OWL horst inference 8. WebPIE [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], which is also
based on MapReduce, presents an OWL-horst inference time of 35 minutes, albeit on
64 lower-spec nodes and requiring an additional dictionary encoding step.
      </p>
      <p>Given the significant overhead of nonmonotonic reasoning, and in particular, the
fact that inferences can not be drawn directly, this result is counter-intuitive. The key to
the favorable performance of our approach is that the “depth” of the reasoning is fixed,
on a per rule set basis. The immediate consequence is that the number of MapReduce
jobs, which bear significant startup costs, is also fixed. In other words, the “predictable”
nature of stratified logics allows us to have less complicated relationships between facts
in the system.</p>
      <p>
        Finally, we should take into consideration the fact that LUBM produces fairly
uniform data. Although there is significant skew in LUBM (e.g. in the frequency of terms
such as rdf:type), the rule set that we have used in the evaluation does not perform joins
on such highly skewed terms. However, our previous work [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ] shows that our approach
can cope with highly skewed data, which follow a zipf distribution.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Future Work</title>
      <p>In this paper we studied the feasibility of nonmonotonic rule systems over large volumes
of semantic data. In particular, we considered defeasible reasoning over RDF, and ran
experiments over RDF data. Our results demonstrate that such reasoning scales very
well. In particular, we have shown that nonmonotonic reasoning is not only possible,
but can compete with state-of-the-art monotonic logics. To the best of our knowledge,
this is the first study demonstrating the feasibility of inconsistency-tolerant reasoning
over RDF data using mass parallelization techniques.</p>
      <p>In future work, we intend to perform an extensive experimental evaluation in order
to verify our results for different input dataset morphologies. In addition, we plan to
apply the MapReduce framework to ontology dynamics (including evolution, diagnosis,
and repair) approaches based on validity rules (integrity constraints). These problems
are closely related to inconsistency-tolerant reasoning, as violation of constraints may
be viewed as a logical inconsistency.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments References</title>
      <p>This work was partially supported by the PlanetData NoE (FP7:ICT-2009.3.4, #257641).
8 http://www.ontotext.com/owlim/benchmark-results/lubm</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Afrati</surname>
            ,
            <given-names>F.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          :
          <article-title>Optimizing joins in a mapreduce environment</article-title>
          .
          <source>In: EDBT</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Antoniou</surname>
          </string-name>
          , G., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <given-names>A Semantic</given-names>
            <surname>Web</surname>
          </string-name>
          <string-name>
            <surname>Primer</surname>
          </string-name>
          ,
          <source>2nd Edition</source>
          . The MIT Press,
          <volume>2</volume>
          <fpage>edn</fpage>
          .
          <source>(March</source>
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Antoniou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Nonmonotonic reasoning</article-title>
          . MIT Press (
          <year>1997</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kosters</surname>
          </string-name>
          , R.:
          <article-title>Nonstandard Inferences in Description Logics: The Story So Far</article-title>
          .
          <source>Mathematical Problems from Applied Logic I,</source>
          volume
          <volume>4</volume>
          of International Mathematical Series (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Billington</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          : Defeasible Logic is Stable.
          <source>J. Log. Comput</source>
          .
          <volume>3</volume>
          (
          <issue>4</issue>
          ),
          <fpage>379</fpage>
          -
          <lpage>400</lpage>
          (
          <year>1993</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Brickley</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guha</surname>
          </string-name>
          , R.:
          <source>RDF Vocabulary Description Language 1</source>
          .0:
          <string-name>
            <given-names>RDF</given-names>
            <surname>Schema</surname>
          </string-name>
          . www.w3.org/TR/2004/REC-rdf-schema-
          <volume>20040210</volume>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ghemawat</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>MapReduce: simplified data processing on large clusters</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fische</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Investigation</surname>
          </string-name>
          &amp;
          <article-title>Design for Rule-based Reasoning</article-title>
          .
          <source>Tech. rep</source>
          .,
          <source>LarKC</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Goodman</surname>
            ,
            <given-names>E.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jimenez</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mizell</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Al-Saffar</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Adolf</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haglin</surname>
          </string-name>
          , D.J.: HighPerformance Computing Applied to Semantic Databases.
          <source>In: ESWC (2)</source>
          . pp.
          <fpage>31</fpage>
          -
          <lpage>45</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Haase</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lutz</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Complexity of Subsumption in the EL Family of Description Logics: Acyclic and Cyclic TBoxes</article-title>
          .
          <source>In: ECAI-08</source>
          . pp.
          <fpage>25</fpage>
          -
          <lpage>29</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Scalable Authoritative OWL Reasoning for the Web</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>5</volume>
          (
          <issue>2</issue>
          ),
          <fpage>49</fpage>
          -
          <lpage>90</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Kotoulas</surname>
          </string-name>
          , S., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Weaver</surname>
          </string-name>
          , J.:
          <source>KR and Reasoning on the Semantic Web: WebScale Reasoning</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Mind the data skew: distributed inferencing by speeddating in elastic regions</article-title>
          .
          <source>In: WWW</source>
          . pp.
          <fpage>531</fpage>
          -
          <lpage>540</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rock</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoniou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billington</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <source>Efficient Defeasible Reasoning Systems. IJAIT 10</source>
          ,
          <year>2001</year>
          (
          <year>2001</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Maher</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          :
          <article-title>Propositional Defeasible Logic has Linear Complexity</article-title>
          .
          <article-title>CoRR cs</article-title>
          .
          <source>AI/0405090</source>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Maluszynski</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szalas</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Living with Inconsistency and Taming Nonmonotonicity</article-title>
          . In: Datalog. pp.
          <fpage>384</fpage>
          -
          <lpage>398</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Manola</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McBride</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          : RDF Primer. www.w3.org/TR/rdf-primer (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Mutharaju</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maier</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>A MapReduce Algorithm for EL+</article-title>
          . In: Description Logics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Nebel</surname>
          </string-name>
          , B.:
          <article-title>Terminological Reasoning is Inherently Intractable</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>43</volume>
          ,
          <fpage>235</fpage>
          -
          <lpage>249</lpage>
          (
          <year>1990</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Nute</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Defeasible Logic</article-title>
          .
          <source>In: Handbook of Logic in Artificial Intelligence and Logic Programming-Nonmonotonic Reasoning and Uncertain Reasoning(Volume 3)</source>
          (
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Oren</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anadiotis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Siebes</surname>
          </string-name>
          , R., ten
          <string-name>
            <surname>Teije</surname>
          </string-name>
          , A.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Marvin: Distributed reasoning over large-scale Semantic Web data</article-title>
          .
          <source>J. Web Sem</source>
          .
          <volume>7</volume>
          (
          <issue>4</issue>
          ),
          <fpage>305</fpage>
          -
          <lpage>316</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <given-names>R.</given-names>
            <surname>Vernica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Balmin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.B.</given-names>
            ,
            <surname>Ercegovac</surname>
          </string-name>
          ,
          <string-name>
            <surname>V.</surname>
          </string-name>
          :
          <article-title>Adaptive Mapreduce using Situation-Aware Mappers</article-title>
          . In: EDBT
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Roussakis</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouris</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Declarative Repairing Policies for Curated KBs</article-title>
          . In: HDMS (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Serfiotis</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koffina</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tannen</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Containment and Minimization of RDF(S) Query Patterns</article-title>
          . In: ISWC-
          <volume>05</volume>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Tachmazidis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoniou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouris</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Towards Parallel Nonmonotonic Reasoning with Billions of Facts</article-title>
          . In: KR-
          <volume>12</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Tachmazidis</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Antoniou</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Flouris</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McCluskey</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Large-scale Parallel Stratified Defeasible Reasoning</article-title>
          .
          <source>In: ECAI-12</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maassen</surname>
          </string-name>
          , J., van
          <string-name>
            <surname>Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bal</surname>
            ,
            <given-names>H.E.</given-names>
          </string-name>
          :
          <article-title>OWL reasoning with webPIE: Calculating the Closure of 100 Billion Triples</article-title>
          .
          <source>In: ESWC (1)</source>
          . pp.
          <fpage>213</fpage>
          -
          <lpage>227</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Urbani</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oren</surname>
          </string-name>
          , E.,
          <string-name>
            <surname>van Harmelen</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Scalable Distributed Reasoning Using MapReduce</article-title>
          . In: ISWC. pp.
          <fpage>634</fpage>
          -
          <lpage>649</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Weaver</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hendler</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          :
          <article-title>Parallel materialization of the finite rdfs closure for hundreds of millions of triples</article-title>
          . In: International Semantic Web Conference. pp.
          <fpage>682</fpage>
          -
          <lpage>697</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>