=Paper= {{Paper |id=None |storemode=property |title=Reflections on: Reasoning and Querying Web-scale Open Data based on DL-LiteA in a Divide-and-Conquer Way |pdfUrl=https://ceur-ws.org/Vol-2576/paper02.pdf |volume=Vol-2576 |authors=Zhenzhen Gu,Songmao Zhang,Cungen Cao |dblpUrl=https://dblp.org/rec/conf/semweb/GuZC19 }} ==Reflections on: Reasoning and Querying Web-scale Open Data based on DL-LiteA in a Divide-and-Conquer Way== https://ceur-ws.org/Vol-2576/paper02.pdf
       Reflections on: Reasoning and Querying
     Web-scale Open Data based on DL-LiteA in a
               Divide-and-Conquer Way

                      Zhenzhen Gu1 , Songmao Zhang2 , and Cungen Cao1
 1
         Institute of Computing Technology, Chinese Academy of Sciences, Beijing, China
     2
          Academy of Mathematics and Systems Sciences, Chinese Academy of Sciences,
                                        Beijing, China
                guzhenzhen0720@163.com smzhang@math.ac.cn cgcao@ict.ac.cn


1         Introduction
Problem Description. Although the sheer size of the open datasets [2] in
Linked Open Data (LOD) [3] can be viewed as a positive sign for Semantic Web
initiatives, it causes performance bottlenecks for systems and tools that provide
data managing and query answering services over the LOD datasets. Moreover,
as the growth of the schema knowledge described by OWL, realizing efficient
reasoning and expressive query answering has become a challenging task for both
data publishers and end users. Most of the studies in this respect (for example,
[4–6]) focused on dealing with the scalability issue whereas inferring knowledge
through reasoning is often ignored. Moreover, when reasoning is considered, most
of the time only lightweight RDFS reasoning can be supported.
Proposal of DL-LiteA . Motivated by a statistic analysis of the BTC 2012
dataset3 , we propose to use DL-LiteA [10] techniques to reason and query the
web-scale open datasets (knowledge bases) described by Semantic Web standards
like RDF and OWL due to the low reasoning complexity and suitable expressiv-
ity of the language. The distinguished feature of DL-LiteA is FOL-rewritability.
This means that in DL-LiteA , by schema knowledge reasoning, both satisfiabil-
ity checking and conjunctive query (CQ) answering can be eventually reduced
to answering queries over the data layers of the corresponding knowledge bases
(KBs). Moreover the separation between schema knowledge and data layer rea-
soning makes it possible to use highly optimized database management systems
and engines in practice for query answering.
Bottleneck of DL-LiteA . Unfortunately, even with low reasoning complexity,
when facing the real-life scalability challenge, the actual reasoning and CQ an-
swering realized by DL-LiteA may become infeasible by the following two factors.
Firstly, for both satisfiability checking and CQ answering, a polynomial size of
queries may need to be answered over the data layers of the corresponding KBs
w.r.t. the size of the schema knowledge of these KBs. Secondly, for the KBs
with massive individual assertions, evaluating a single query over the data lay-
ers may be highly time-consuming. The combination of these two factors makes
3
     https://km.aifb.kit.edu/projects/btc-2012/


Copyright © 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2      Gu et al.

the situation even worse, calling for new techniques for DL-LiteA to tackle the
scalability and efficiency challenge.
Our Solution–Divide-and-Conquer. In computer science, divide-and-conquer
is the basis of efficient algorithms for all kinds of problems. This impels us to
seek for a divide-and-conquer reasoning and query answering approach for DL-
LiteA , with the basic idea of partitioning both KBs and CQs into smaller chunks
and decomposing the original reasoning and query answering tasks into a group
of independent sub-tasks such that the overall performance can be improved by
taking advantage of the parallelization and distribution techniques.
     The challenge for designing such an approach lies in how to carry out KB
and CQ partitioning and reasoning reduction in a sound and complete way.
Motivated by hash partitioning of RDF graphs (the partitions with the local
feature of star-query answering) [15], we expect the smaller KB chunks to have
the local feature for both satisfiability checking and simple-query answering. Here
simple-queries (SQs) are the CQs whose query atoms share a common variable
or individual. For CQ answering, we expect to partition a CQ into smaller SQs
and evaluate them over smaller KB chunks.
     Under these expectations, our divide-and-conquer approach is constructed
from both theoretical and practical perspectives. Theoretically, definitions of
KB partitions and CQ partitions are presented and the sufficient and necessary
conditions are identified to determine whether a KB partition holds the desired
features and whether a CQ can be answered in the desired way. Practically, based
on the theoretical results, the concrete ways of partitioning KBs and CQs and
evaluating CQ partitions over KB partitions are described. Moreover, a strategy
of optimizing the procedure of evaluating CQ partitions over KB partitions is
provided to improve the overall query answering performance. To verify our pro-
posal and approach, experiments are conducted on two web-scale open datasets,
i.e., DBpedia and BTC 2012 dataset, and the testing results indicate that the
proposed approach opens new possibilities for realizing performance-critical ap-
plications on the Web with both expressivity and high scalability.
     To the best of our knowledge, this is the first study on partitioning DL-LiteA
KBs and CQs for the purpose of efficiently reasoning with and querying large-
scale datasets. We assume that the readers are familiar with DL-LiteA and CQs.
Next, we illustrate our main scientific contributions, containing KB partitioning,
query partitioning, optimization and experiments. The technique details can be
seen in [1]. In this paper, for a CQ Q and DL-LiteA KB K, we use Ans(Q, K) to
denote the set of all the certain answers of Q over K. Moreover, we use Hd(Q)
and Bd(Q) to denote the head and body of Q, respectively. For two tuples x and
u with the same dimension, we use [x/u] to denote a substitution.


2   DL-LiteA KB partitioning
Here, we discuss partitioning DL-LiteA KBs into small chunks with the desired
feature, containing the formalization of KB partitions, the conditions of detecting
the desired KB partitions and the concrete way of computing such KB partitions.
                                      Title Suppressed Due to Excessive Length              3

Formalizing KB partitions. Before defining DL-LiteA KB partitions, we first
provide a way of computing a smaller sub-TBox for a given sub-ABox.
Procedure 1. For a DL-LiteA KB (T , A) and sub-ABox A0 ⊆ A, we use T |A0
to denote a sub-TBox of T constructed by the steps 1–2. Let T |A0 = ∅. Then:
Step 1. For each α ∈ T , add α to T |A0 if α has the form γ v ¬η or Fun(S) and each
        name in α occurs in (T |A0 , A0 ), or α has the form γ v η and the name in γ
        occurs in (T |A0 , A0 );
Step 2. Iterate Step 1, until T |A0 do not change anymore.
    Since for a sub-ABox, not all the axioms are relevant in terms of satisfiabil-
ity checking and CQ answering, and smaller TBoxes can make the considered
reasoning tasks be realized more quickly. The correctness is shown below:
Lemma 1. For a DL-LiteA KB (T , A) and A0 ⊆ A, (1) (T |A0, A0 ) is satisfiable
iff (T , A0 ) is satisfiable; and (2) Ans(Q, (T |A0, A0 )) = Ans(Q, (T , A0 )) for CQ Q.
    By Lemma 1, partitions of DL-LiteA KBs are defined in the definition below.
Definition 1. For DL-LiteA KB K = (T , A), the set S = ∪ni=1 {(Ti , Ai )} of DL-
LiteA KBs is called a partition of K if A = ∪ni=1 Ai and Ti = T |Ai . S is called a
SCSQA-local partition of K iff (1) K is satisfiable iff each KB in S is satisfiable;
and (2) if K is satisfiable then Ans(Q, K) = ∪K0 ∈S Ans(Q, K0 ) for each SQ Q.
Detecting SCSQA-local partitions. By analyzing the algorithms of satisfi-
ability checking and CQ rewriting [10], we found that in DL-LiteA , both satis-
fiability checking and SQ answering can be reduced to answering SQs over the
ABoxes of the corresponding KBs, shown in the lemma below.
Lemma 2. For a DL-LiteA KB K = (T , A), (1) there exists a set Q of SQs so
that K is satisfiable iff ∪Q∈Q Ans(Q, (∅, A) = ∅; and (2) if K is satisfiable then for
each SQ Q, there exists a set Q of SQs so that Ans(Q, K) = ∪Q0 ∈Q Ans(Q0, (∅, A)).
    All the query atoms in a SQ share a common individual or variable. Therefore,
by Lemma 2, we can get that in the scenario of KB partitioning, for the local
feature of satisfiability checking and SQ answering, the assertions in the original
KB and sharing a common individual should be put in an identical sub-KB.
Proposition 1. For DL-LiteA KB K = (T , A) and partition S of K, if for each
U ⊆ A satisfying that all the assertions in U share a common individual, there
exists (T 0 , A0 ) ∈ S such that U ⊆ A0 , then S is a SCSQA-local partition of K.
   Without redundant individual assertions [1] (Theorem 1), the condition in
Proposition 1 can be used as a necessary and sufficient condition.
Computing SCSQA-local partitions. By Proposition 1, SCSQA-local par-
titions of DL-LiteA KBs can be obtained by the following procedure.
Procedure 2. For a DL-LiteA KB K = (T , A), a SCSQA-local partition S of
K can be obtained by the following steps 1–2:
Step 1. Compute a partition A1 , · · · , An of A by the following two sub-steps a–b:
        a. Compute the set U = {Ua |a ∈ I} where I is the set of all the individuals of
           A, and Ua is the set consisting of all the assertions in A that contain a.
        b. Compute a partition U1 , · · · , Un of U. For each Ui , let Ai = ∪e∈Ui e.
Step 2. For each 1 ≤ i ≤ n, compute the TBox T |Ai for Ai . Let S = ∪n    i=1 {(T |Ai , Ai )}.
4        Gu et al.

     Different ways of partitioning U in step 1.b lead to different SCSQA-local
partitions of K. In terms of distribution and parallelization, the partitions gen-
erating more balanced, smaller sized sub-ABoxes are more likely to achieve the
efficiency improvement. For a given size asize of ABoxes in KB partition, a par-
tition of U generating sub-ABoxes sized no more than asize can be obtained by:
b1. Compute an ordered tuple O = U1 , · · · , Um of the elements in U such that |Ui | ≤
    |Ui+1 | for 1 ≤ i ≤ m − 1;
b2. Continue the following procedure P, until O does not contain any element.
     P. Generate a sub-ABox A0 . Let A0 = ∅. For i from |O| to 1, if |A0 ∪ O[i]| ≤ asize
        then let A0 = A0 ∪ O[i]; otherwise for j from 1 to i − 1, if |A0 ∪ O[j]| ≤ asize
        then let A0 = A0 ∪O[j], otherwise break the first loop. Remove all the elements
        used to generate A0 from O.
The complexity of computing SCSQA-local partitions of DL-LiteA KBs is shown
below. Low complexity makes the proposed approach appealing for partitioning a
large-scale KB into smaller, independent sub-KBs with the desired local feature.
Theorem 1. For a KB K = (T , A), a SCSQA-local partition of K can be com-
puted in O(|T |2 ) w.r.t. |T | and in O(|A|) w.r.t. |A|.

3    Conjunctive query partitioning and evaluation
Here, we discuss CQ partitioning and evaluation, containing the formalization of
CQ partitions, the conditions of detecting CQ partitions and the concrete way
of computing and evaluating CQ partitions.
Formalizing CQ partitions. CQ partitions are defined in the definition below.
Definition 2. For a CQ Q, we say the set Q = ∪ni=1 {qi } of CQs is a partition
of Q if (1) Bd(Q) = ]ni=1 Bd(qi ) and (2) for each 1 ≤ i ≤ n, Hd(qi ) consists of all
the variables in Bd(qi ) that also occur either in Hd(Q) or ∪nk=1,k6=i Bd(qk ). We
say that Q is a SQ-partition of Q iff all the queries in Q are SQs.
   The concrete way of merging the certain answers of the sub-queries and the
correctness are shown in the following definition and lemma, respectively.
Definition 3. For a partition Q = ∪ni=1 {qi } of a CQ Q and a partition S of a
DL-LiteA KB K, we define:
      Ans(Q, Q, S) = {x[x1 /u1 , · · · , xn /un ]|(u1 , · · · , un ) ∈ Λ1 × · · · × Λn }
where x = Hd(Q), for each 1 ≤ i ≤ n, xi = Hd(qi ) and Λi = ∪K0 ∈S Ans(qi , K0 ), and
for each 1 ≤ i, j ≤ n, 1 ≤ k ≤ |ui | and 1 ≤ l ≤ |uj |, if xi [k] = xj [l] then ui [k] = uj [l].
Lemma 3. Ans(Q, Q, S) ⊆ Ans(Q, K) holds.
    By the semantics of CQs, we know that distinguished (dist.) variables and
non-dist. variables of CQs [1] are interpreted in different ways. Based on this, we
define the concept of SQ-reducibility and show that SQ-reducible CQs can be
answered in the descried way, i.e., by first partitioning CQs into SQs and then
answering the smaller SQs over SCSQA-local partitions of DL-LiteA KBs.
Definition 4. For a CQ Q, we say Q is SQ-reducible iff it has a SQ-partition
Q such that Hd(q) ⊆ Q for each q ∈ Q.
                                   Title Suppressed Due to Excessive Length          5

Lemma 4. For a CQ Q, if Q is SQ-reducible then there exists a SQ-partition
Q of Q such that for each satisfiable DL-LiteA KB K and each SQSQA-local
partition S of K, Ans(Q, K) = Ans(Q, Q, S) holds.
Detecting SQ-reducible CQs. Not all CQs are SQ-reducible. This means that
there exist CQs Q and DL-LiteA KBs K such that evaluating any SQ-partition of
Q over any SCSQA-local partition of K cannot obtain all the certain answers of
Q over K. Thus in order to evaluate as many CQs in the desired way as possible,
we need to provide a iff condition to decide whether a CQ is SQ-reducible.
Theorem 2. A CQ Q is SQ-reducible iff there do not exist three query atoms
P (?x, ?y), α and α0 in Q such that ?x and ?y are non-dist. variables of Q, α
contains ?x but not ?y, and α0 contains ?y but not ?x.

Theorem 3. Detecting whether a CQ Q is SQ-reducible can be done in O(|Q|).
  Next, we present the concrete ways of partitioning and evaluating SQ-reducible
CQs and non-SQ-reducible CQs.
Partitioning and evaluating SQ-reducible CQs. A SQ-reducible CQ may
have many SQ-partitions, so we need to first detect the SQ-partitions that can
capture all the certain answers of CQs and then provide the procedure of comput-
ing the desired SQ-partitions and the complexity of computing such partitions.
Theorem 4. For a SQ-reducible CQ Q, let Q be any SQ-partition of Q such
that Hd(q) ⊆ Hd(Q) for each q ∈ Q. Then Ans(Q, K) = Ans(Q, Q, S) holds, for
any satisfiable DL-LiteA KB K and any SCSQA-local partition S of K.
Procedure 3. For a SQ-reducible CQ Q, a SQ-partition Q of Q satisfying that
Hd(q) ⊆ Hd(Q) for each q ∈ Q can be obtained by the following steps 1–2.
Step 1. Let A = Bd(Q) and U = ∅. Compute a partition U of Bd(Q) by the steps a-b.
        a. For each non-dist. variable x of Q, compute the set Ux consisting of all the
           atoms in A that contain x, and let U = U ∪ {Ux } and A = A − Ux . Then for
           every two different non-dist. variables x and y of Q that occur in a same
           atom in Q, replace Ux and Uy in U with Ux ∪ Uy ;
        b. For each dist. variable or individual x of Q, compute the set Ux consisting
           of all the atoms in A that contain x, and let U = U ∪ {Ux } and A = A − Ux .
Step 2. For U ∈ U, generate a CQ q such that Bd(q) = U and Hd(q) consists of all the
        variables occurring in U that also occur in Hd(Q) or Bd(Q)−U, add q to Q.

Theorem 5. For a SQ-reducible CQ Q, a partition Q of Q satisfying hd(q) ⊆
hd(Q) for each q ∈ Q can be obtained in O(|Q|).
Partitioning and evaluating non-SQ-reducible CQs. If a CQ is not SQ-
reducible, then non-SQ-partitions need to be evaluated to obtain all the certain
answers of this CQ. By the semantics of CQs, we find that if a certain answer
of a CQ Q over a KB is obtained by binding some non-dist. variables NV to
anonymous elements, then in the scenario of KB and query partitioning, this
certain answer can solely be obtained by evaluating a partition of Q in which the
sub-queries do not contain the variables in NV as dist. variables. The definition
below illustrates the partitions that can capture such certain answers of Q.
6       Gu et al.

Definition 5. For a CQ Q and set NV of non-dist. variables of Q, we use Q|NV     SQ
to denote a partition of Q satisfying that (1) each variable in NV solely occurs
in one query in Q, and (2) for each non-simple query q ∈ Q and for every two
different atoms α and α0 in q, there exists a sequence β1 , · · · , βn of atoms in q
such that β1 = α, βn = α0 and βi and βi+1 share a variable in NV for 1 ≤ i ≤ n−1.
   The concrete way of answering non-SQ-reducible CQs and the procedure of
computing the desired partitions as well as the complexity are shown below.
Theorem 6. Given a non-SQ-reducible CQ Q, for any satisfiable DL-LiteA KB
K and any SCSQA-local partition S of K, Ans(Q, K) = ∪NV’∈2NV Ans(Q, Q|NV’ SQ , S)
holds where NV is the set consisting of all the non-dist. variables of Q.
Procedure 5. For a non-SQ-reducible CQ Q and set NV of some non-dist.
variables of Q, the partition Q|NV
                                SQ , can be obtained by the following two steps:
Step 1. Let A = Bd(Q) and U = ∅. Compute a partition U of Bd(Q) by the steps a–c.
        a. Compute an undirected graph G with node set A. For each (α, α0 ) ∈ (A)2 ,
           there exists an edge between α and α0 iff they share a variable in NV;
        b. For each connected component l of G, generate an atom set U consisting of
           all the nodes in l, and set U = U ∪ {U} and A = A − U;
        c. For each variable or individual x occurring in A, compute a set U consisting
           of all the atoms in A that contain x, and set U = U ∪ {U} and A = A − U.
Step 2. For each U ∈ U, generate a CQ q so that Bd(q) = U and Hd(q) consists of all
        the variables in U that also occur in Hd(Q) or Bd(Q) − U, and add q to Q.
Theorem 7. The complexity of computing the partition Q|NV              2
                                                       SQ of Q is O(|Q| ).


4    Optimizing the procedure of query partition evaluation
Although query partitioning can reduce the final number of queries needed to
be answered over the ABox of a KB significantly, a large amount of intermediate
results generated by answering sub-queries may also slow down the overall query
answering procedure. Thus we further provide an optimization strategy to reduce
the intermediate results as many and as early as possible by transferring values
between sub-queries, as shown in the following example and proposition.
Example 1. Let S be a partition of a KB. Consider the partition Q of a CQ Q:
                    Q : R(a, ?x) ∧ P (?x, ?y) ∧ S(?y, ?z) → q(?x, ?y, ?z)
      Q = {q1 : R(a, ?x) ∧ P (?x, ?y) → q(?x, ?y), q2 : S(?y, ?z) → q(?y, ?z)}
After answering q1 over the KBs in S, suppose we obtain the answer set Λ =
∪3i=1{(ai , bi )}. If we transfer the bindings of ?y to q2 before it is evaluated, i.e.,
replacing answering q2 with answering the CQs ∪3i=1{S(bi , ?z) → q(bi , ?z)}, then
we can avoid computing the answers of q2 which do not bind ?y to b1 , b2 or b3 .
Proposition 2. For a partition Q = ∪ni=1 {qi } of a CQ Q and a partition S of
a satisfiable DL-LiteA KB, the following equation holds:
  Ans(Q, Q, S) = {x[x1 /u1 , · · · , xn /un ]|(u1 , · · · , un ) ∈ Λ1 |B1 × · · · × Λn |Bn }
where x = Hd(Q), for 1 ≤ i ≤ n, xi = Hd(qi ), B1 = ∅ and for 1 < k ≤ n:
 Bk = {[x1 /u1 , · · · , xk−1 /uk−1 ]|xk |(u1 , · · · , uk−1 ) ∈ Λ1 |B1 × · · · × Λk−1 |Bk−1 }
where Λ1 |B1 = ∪K∈S Ans(q1 , K) and for 1 < j ≤ n, Λj |Bj = ∪K∈S ∪ξ∈Bj Ans(qj ξ, K).
                                 Title Suppressed Due to Excessive Length         7

    In Proposition 2, Bk denotes the set of the bindings of some variables of qk
obtained from the certain answers of the sub-queries evaluated before qk . By
observing the CQ rewriting procedure in DL-LiteA , we find that the transferred
variable bindings do not affect the query rewriting procedure. This means that
when answering a sub-query q over a sub-KB K = (T , A) with a set B of variable
bindings, the CQs obtained by first transferring the bindings in B to q and then
rewriting the resultant queries over T are equivalent to the CQs obtained by
first rewriting q over T and then transferring the bindings in B to the rewritten
queries. Thus we actually need one time of query rewriting rather than |B|. Hence
the procedure of evaluating the partitions of queries can be further optimized.
Proposition 3. For a CQ q, satisfiable DL-LiteA KB K = (T , A) and set B of
functions ξ that maps some variables of q to names, then we can get that:
       ∪ξ∈B PerfectRef(qξ, T ) = ∪ξ∈B {q 0 ξ|q 0 ∈ PerfectRef(q, T )}
                ∪ξ∈B Ans(qξ, K) = ∪q0 ∈PerfectRef(q,T ) ∪ξ∈B Ans(q 0 ξ, (∅, A))
where PerfectRef denotes one classical query rewriting algorithm in DL-LiteA .
   Experiments presented in the subsequent section demonstrate that the value
transferring strategy can significantly improve the query answering performance.

5   The evaluation
In order to demonstrate the efficiency and rationality of our approach and pro-
posal, we have implemented a prototype system, where multi-thread techniques
are adopted to check the satisfiability of and answer a single query over the sub-
KBs in a KB partition in a parallel way. We run the system on two web-scale,
real-world datasets, i.e., DBpedia 200 dataset extended with DBpedia ontology
and BTC 2012 dataset. Next, we present the main results of the experiments.
SCSQA-local partition computation. After materializing the name equiva-
lence relations, two DL-LiteA KBs Kdbp and Kbtc are constructed, respectively
with 233,227 axioms and 0.2 billion individual assertions and 339,519 axioms and
1.04 billion individual assertions. We randomly choose two numbers, 28 million
and 52 million, to respectively limit the size of the sub-ABoxes in KB partitions.
Then, a SCSQA-local partition Sdbp of Kdbp with 14 sub-KBs and a SCSQA-local
partition Sbtc of Kbtc with 30 sub-KBs are computed by Procedure 2.
Satisfiability checking. Without KB partition, checking the satisfiability of
Kdbp totally cost 6 minutes, and for Kbtc , after 6 hours an out-of-memory error
was reported caused by the failure in completing generating the queries needed
to be evaluated over its ABox. However, facilitated by KB partitioning equipped
with parallelization, satisfiability checking of Kdbp and Kbtc has been success-
fully done in 0.07 and 0.15 minutes, respectively. The testing results indicate
that KB partitioning equipped with parallelization can significantly improve the
performance of checking the satisfiability of web-scale DL-LiteA KBs.
Query answering checking. In order to detect the effect of KB and query par-
titioning as well as sub-query value transfer on the performance of CQ answering,
we design 14 CQs (8 SQs and 6 non-SQs) for Kdbp and Kbtc , respectively.
8       Gu et al.

    The SQ answering results indicate that no matter a SQ has a large or small
number of rewritten queries over the original KB or it can be evaluated efficiently
or inefficiently, in the situation of KB partitioning equipped with parallelization,
this SQ can be answered in an extremely efficient way. On the other hand, the
results of evaluating the tested non-SQ queries indicate that: (1) when a CQ
cannot be evaluated over the original KB efficiently, query partitioning combined
with KB partitioning and sub-query value transfer can make this query to be
evaluated in a very efficient way; (2) no matter with or without KB partitioning,
query partitioning can reduce the number of rewritten CQs significantly; (3)
even when query partitioning cannot reduce the number of the rewritten queries,
evaluating query partitions over KB partitions with sub-query value transfer can
speed up the procedure; and (4) for the queries already efficiently answerable
over the original KBs, our approach may not further improve the performance.
Comparing with related systems. To demonstrate the rationality and effi-
ciency of our proposal and approach, we have compared our system with the two
state-of-the-art, centralized triple stores, Jena-TDB and Virtuoso which support
RDFS reasoning. The comparison results can be summarized as: (1) even with
KB and query partitioning, our approach is not as efficient as the two RDF
stores which evaluate queries directly without taking reasoning into considera-
tion; (2) when reasoning is considered, the performance of the two triple stores
drops significantly, and they do not perform as well as our approach; and (3)
query answering without considering reasoning can miss many certain answers.

6   Comparison with related work
Compared with the works on querying LOD [4–9], our approach pursues both
expressivity and scalability, capturing more schema knowledge and handling
more expressive queries with the support of DL-LiteA techniques. For scala-
bility, we provide a divide-and-conquer reasoning and query answering approach
for DL-LiteA so that the massive knowledge in LOD can be managed by parallel
and distribution techniques.
Compared with the works on DL KB modularity [12–14], we study a tractable
language DL-LiteA , and concentrate on satisfiability checking and CQ answering
rather than entailment checking. Moreover, we provide a way of ABox partition-
ing with linear data complexity where reasoning is not necessary. Compared
with the works on RDF graph partitioning [15–17], we consider RDF graphs as
DL-LiteA KBs so that constraints in these graphs can be captured.
Compared with the works on query rewriting optimization [18–21], our ap-
proach features query rewriting optimization in DL-LiteA from the distinctive
perspective of query and KB partitioning, and the experiments on real-world
datasets show that the number of rewritten queries can be significantly reduced.
Compared with the works [22,23] on fast satisfiability checking, we propose to
improve the performance in DL-LiteA from the perspective of KB partitioning.
Experiments on web-scale datasets demonstrate the efficiency of our approach.
Acknowledgments. This work has been supported by the National Key Research and
Development Program of China under grants 2017YFC1700300 and 2017YFB1002300.
                                    Title Suppressed Due to Excessive Length          9

References
1. Gu, Z., Zhang, S., Cao, C,: Reasoning and Querying Web-scale Open Data based
   on DL-LiteA in a Divide-and-Conquer Way. J. Web Semant. 55: 122-144 (2019).
2. O’Riain, S., Curry, E., Harth, A.: XBRL and open data for global financial ecosys-
   tems: A linked data approach. Int. J. Accounting Inf. Systems. 13 (2):141-162 (2012).
3. Bauer, F., Kaltenböck, M.: Linked open data: The essentials, Edition
   mono/monochrom, Vienna, Austria, 2011.
4. Bishop, B., Kiryakov, A., Ognyanov, D., Peikov, I., Tashev, Z., Velkov, R.: Fact-
   Forge: A fast track to the web of Data. Semantic Web, 2 (2) (2011).
5. Schwarte, A., Haase, P., Hose, K., Schenkel, R., Schmidt, M.: FedX: Optimization
   Techniques for Federated Query Processing on Linked Data. In: ISWC, 2011.
6. Umbrich, J., Hogan, A., Polleres, A., Decker, S.: Link Traversal Querying for a
   diverse Web of Data. Semantic Web, 6 (6) (2012) 585-624.
7. Montoya, G., Skaf-Molli, H., Molli, P., Vidal, M. E.: Federated SPARQL Queries
   Processing with Replicated Fragments. In: ISWC, 2015.
8. Hartig, O., Freytag, J. C.: Foundations of traversal based query execution over
   Linked Data. In: ACM Hypertext, 2012.
9. Sabri, M. M.: A Hybrid Framework for Online Execution of Linked Data Queries.
   In: WWW, 2015.
10. Poggi, A., Lembo, L., Calvanese, D., De Giacomo, G., Lenzerini, M., Rosati, R.:
   Linking Data to Ontologies. J. Data Semantics. 10 (2008) 133-173.
11. Calvanese, D., De Giacomo, G., Lembo, D., Lenzerini, M., Rosati, R.: Tractable
   reasoning and efficient query answering in description logics: The DL-Lite family.
   J. Autom. Reasoning. 39 (3) (2007).
12. Grau, B. C., Horrocks, I., Kazakov, Y., Sattler, U.: Modular reuse of ontologies:
   theory and practice. J. Artif. Intell. Res. 31 (2008).
13. Botoeva, E., Konev, B., Lutz, C., Ryzhikov, V., Wolter, F., Zakharyaschev, M.:
   Inseparability and conservative extensions of description logic ontologies: A survey.
   In: Reasoning Wb, 2016.
14. Xu, J., Shironoshita, P., Visser, U., John, N., Kabuka, M.: Module Extraction for
   Efficient Object Queries over Ontologies with Large ABoxes. Artif Intell Appl. 2 (1)
   (2015).
15. Harbi, R., Abdelaziz, I., Kalnis, P., Mamoulis, N.: Evaluating SPARQL queries on
   massive RDF datasets. Proceedings of the VLDB Endowment, 8 (12) (2015).
16. Potter, A., Motik, B., Horrocks, I.: Querying Distributed RDF Graphs: The Effects
   of Partitioning. In: SSWS@ISWC, 2014.
17. Yang, S., Yan, X., Zong, B., Khan, A.: Towards effective partition management for
   large graphs. In: SIGMOD Conference, 2012.
18. Rosati, R., Almatelli, A.: Improving Query Answering over DL-Lite Ontologies.
   In: KR, 2010.
19. Venetis, T., Stoilos, G., Stamou, G. B.: Query extensions and incremental query
   rewriting for OWL 2 QL ontologies. J. Data Semantics, 3 (1) (2014).
20. Bursztyn, D., Goasdoué, F., Manolescu, I.: Teaching an RDBMS about ontological
   constraints. Proceedings of the VLDB Endowment, 9 (12) (2016).
21. Gottlob, G., Orsi, G., Pieris, A.: Query rewriting and optimization for ontological
   databases. ACM Trans. Database Syst. 39 (3) (2014).
22. Paulheim, H., Stuckenschmidt, H.: Fast Approximate A-Box Consistency Checking
   Using Machine Learning. In: ESWC, 2016.
23. Fokoue, A., Kershenbaum, A., Ma, L., Schonberg, E., Srinivas, K.: The Summary
   Abox: Cutting Ontologies Down to Size. In: ISWC, 2006.