=Paper= {{Paper |id=Vol-1350/paper-21 |storemode=property |title=Lower and Upper Bounds for SPARQL Queries over OWL Ontologies |pdfUrl=https://ceur-ws.org/Vol-1350/paper-21.pdf |volume=Vol-1350 |dblpUrl=https://dblp.org/rec/conf/dlog/GlimmKKS15 }} ==Lower and Upper Bounds for SPARQL Queries over OWL Ontologies== https://ceur-ws.org/Vol-1350/paper-21.pdf
     Lower and Upper Bounds for SPARQL Queries over
                    OWL Ontologies

          Birte Glimm1 Yevgeny Kazakov1 Ilianna Kollia2 and Giorgos Stamou2
                 1
                   Ulm University, Germany, @uni-ulm.de
 2
     National Technical University of Athens, Greece, ilianna2@mail.ntua.gr, gstam@cs.ntua.gr


Introduction
The recent standardization of the SPARQL 1.1 Entailment Regimes [2], which extend
the SPARQL Query Language [3] with the capability of querying also for implicit
knowledge makes the need for an efficient evaluation of complex queries over OWL
ontologies urgent. We present an approach for optimizing the evaluation of SPARQL
queries over OWL ontologies using SPARQL’s OWL Direct Semantics entailment regi-
me. Such queries consist of axiom templates, i.e., Description Logic (DL) axioms with
variables in place of concept, role and individual names. Answers to such queries are
mappings of query (concept, role or individual) variables to corresponding (concept,
role or individual) names that instantiate the axiom templates to axioms entailed by the
queried knowledge base (KB).
    Since computing query answers over an expressive KB is computationally very
costly, approximation techniques have been proposed that use a weakened version of
the KB to compute a lower bound (yields sound but potentially incomplete results) and
a strengthened version to compute an upper bound (yields complete but potentially un-
sound results) for the results [9, 7, 8]. Another well-known technique is to compute the
bounds from a complete and clash-free tableau generated by a DL reasoner [4, 5]. De-
terministically derived facts are used as lower bound, while also non-deterministically
derived ones are considered for the upper bound. Answers in the “gap”, i.e., potential
answers in the upper but not the lower bound, usually have to be checked individually
by performing a consistency check with a fully fledged OWL 2 DL reasoner.
    While we also use bounds, we allow for much more expressive queries than related
approaches. To optimize the evaluation of possible query answers in the gap, we present
a query extension approach that uses the TBox of the queried KB to extend the query
with additional parts. We show that the resulting query is equivalent to the original one
and we use the additional parts that are simple to evaluate for restricting the bounds of
subqueries of the initial query. In an empirical evaluation we show that the proposed
query extension approach can lead to a significant decrease in the query execution time
of up to four orders of magnitude. More details about our method as well as more
evaluation results can be found in the extended version of our paper [1].

Improving Bounds via Query Extension
We will show the intuition of the proposed query extension method through an example.
Let K be a KB, A, B, C be concept names, r a role name, a, b, c, d individual names, x
an individual variable, Y a concept variable and

             T = { B v A t C, ∃r.B v C} A = {A(a), B(b), r(a, b), A(d)}
               q = {A(x), ∃r.Y(x), Y v B}

Note that the only answer for q over K is the mapping {µ | µ(x) = a, µ(Y) = B}.
     First we compute bounds for axiom templates of q over K. Since {A(a), A(d)} ⊆ K
we have K |= A(a) and K |= A(d). That is, we can find the lower bound L = {µ |
µ(x) ∈ {a, d}} for the query {A(x)} over K without performing any tests. To find an
upper bound for {A(x)}, we can use a model I of K. It is easy to check that K has a
model I = (∆I , ·I ) with ∆I = {d1 , d2 , d3 , d4 }, aI = d1 , bI = d2 , cI = d3 , dI = d4 ,
AI = {d1 , d2 , d4 }, BI = {d2 }, C I = {d1 } and rI = {hd1 , d2 i}. Note that I 6|= A(c).
Thus, from this model alone one can conclude that K 6|= A(c) and hence that the set
U = {µ | µ(x) ∈ {a, b, d}} provides an upper bound for the query {A(x)} over K.
Although the model I can be similarly used for finding an upper bound for complex
templates, such as {∃r.Y(x)}, in general it can only be found by iterating over all possible
mappings for x and Y and checking which instances of this template are entailed by the
model. Therefore, in practice, one does not compute the bounds for complex templates.
The bounds for the query {Y v B} can be computed by classifying the KB and retrieving
subsumption relationships for B. Since for classification one usually needs to consider
just the (relatively small) TBox T , the bounds for this query can be computed exactly,
i.e., in our example we have L = U = {µ | µ(Y) ∈ {⊥, B}}.
     Our query extension method uses additionally the notion of a subquery bound,which
provides a range for those answers of a subquery (subset of templates) of q that are
sufficient to evaluate q. The intuition of our method is that subquery bounds can be
improved using bounds of other subqueries of this query. Thus, if a query q can be
extended to an equivalent query q ∪ q0 , the number of reasoner calls performed for
evaluating q can be reduced using q0 . The proposed algorithm can be summarized as
follows: First, we replace every (concept, role, individual) variable in q with a fresh
distinct (concept, role, individual) name. For our example, consider a mapping µ such
that µ(x) = a x , µ(Y) = AY with a x an individual and AY a concept name. Then we
materialize and classify K plus µ(q), where µ(q) denotes the result of replacing each
variable x in q with µ(x), i.e., we compute all concept assertions A(a), role assertions
r(a, b), and subsumptions A v B with atomic concepts and roles entailed by K ∪ µ(q).
Afterwards, we replace names back with their corresponding variables in the extended
KB. In our example T ∪ µ(q) = {B v A t C, ∃r.B v C, A(a x ), ∃r.AY (a x ), AY v B} |=
C(a x ). Thus, for q0 = {C(x)}, we have K ∪ µ(q) |= C(a x ) = µ(q0 ), and it holds that the
query q has the same answers for K as the extended query q ∪ q0 = {A(x), ∃r.Y(x), Y v
B, C(x)}. In the end, we compute query bounds for the templates in q0 and use them to
improve the subquery bounds for templates in q. Using again the model I for K, since
I |= C(a), but I 6|= C(b), I 6|= C(c) and I 6|= C(d), we can derive the upper bound
U = {µ | µ(x) = a} for the query {C(x)}. Using this upper bound, it is now possible to
reduce the upper bound for the subquery {A(x)} of q to U. Since U is a subset of the
lower bound for {A(x)} (computed in the beginning of the section), this subquery can
be evaluated without performing any further entailment tests. The new upper bound U
can also be used to further reduce the upper bound for the subquery {∃r.Y(x)} of q to
Table 1. Query answering times in seconds (n/a indicates a timeout, > 30 min) and number of
performed entailment checks for UOBM with the first department using evalStatic and evalExt

                                                                 evalStatic     evalExt
UOBM                                                            time #entail time #entail
q1 = {isAdvisedBy(x, y), GraduateStudent(x), Woman(y)},        20.84     47 10.54     19
     q01 = {Professor(y)}
q2 = {isTaughtBy(x, y), GraduateCourse(x), Woman(y)},          21.63     51 12.06     26
     q02 = {Faculty(y)}
q3 = {teachingAssistantOf(x, y), GraduateCourse(y), Woman(x)}, 12.78     32 5.60      12
     q03 = {TeachingAssistant(x)}
q4 = {∃worksFor.Organization(x), Woman(x)},                       n/a        18.36   135
     q04 = {Employee(x)}
q5 = {X v ∃isHeadOf.Department, ∃isTaughtBy.X(y)},                n/a         1.99     4
     q05 = {X v Chair, CourseTaughtByChair(y)}
q6 = {X v ∃isHeadOf.College, ∃isAdvisedBy.X(y),                   n/a         0.21     0
     q06 = {X v Dean, PersonAdvisedByDean(y)}



{µ | µ(x) = a, µ(Y) ∈ {⊥, B}}. After this reduction, this subquery can be evaluated using
just two entailment tests.


Evaluation
The proposed method has been implemented and evaluated over a set of well-known
benchmark ontologies and relevant datasets and for several forms of queries. Although
it can be used, in general, for improving the performance of most query answering
systems based on query bounds, here the evaluation is based on the system described in
Kollia et al. [5], which, to the best of our knowledge, is the only system that supports
the evaluation of complex queries over OWL 2 DL ontologies under the OWL Direct
Semantics entailment regime of SPARQL 1.1. In our implemented method (referred
to as evalExt) we improve the subquery bounds computed in evalStatic [5] using the
method described in the previous section. Afterwards, we perform the ordering and
evaluation methods of Kollia et al. using the improved subquery bounds.
    In Table 1 we show the results of the evaluation on the University Ontology Bench-
mark (UOBM) [6] using a range of custom queries since the queries provided for
UOBM are only simple conjunctive instance queries. Column 1 shows the query qi
and extension templates q0i (1 ≤ i ≤ 6), columns 2 and 3 show the query answering
times and the number of performed entailment checks for evalStatic, respectively, and
columns 4 and 5 show the respective numbers for evalExt. In all queries the time spent
for query extension is negligible compared to the time spent for query evaluation. We
observe that for all queries, additional extension templates were derived, which have
significantly better query bounds than the complex templates of the queries. This di-
rectly translates to a significantly lower number of entailment checks for evalExt and,
hence, a reduction in execution times. The reduction in query answering times is up to
four orders of magnitude.
References
1. Glimm, B., Kazakov, Y., Kollia, I., Stamou, G.: Lower and upper bounds for SPARQL
   queries over OWL ontologies. In: Proceedings of the 29th Conference on Artificial Intelli-
   gence (AAAI’15) (2015)
2. Glimm, B., Ogbuji, C. (eds.): SPARQL 1.1 Entailment Regimes. W3C Recommendation
   (2013), available at http://www.w3.org/TR/sparql11-entailment/
3. Harris, S., Seaborne, A. (eds.): SPARQL 1.1 Query Language. W3C Recommendation (2013),
   available at http://www.w3.org/TR/sparql11-query/
4. Kollia, I., Glimm, B.: Cost based query ordering over OWL ontologies. In: Proceedings of
   the 11th International Semantic Web Conference (ISWC 2012). Lecture Notes in Computer
   Science, vol. 7649, pp. 231–246. Springer (2012)
5. Kollia, I., Glimm, B.: Optimizing SPARQL Query Answering over OWL Ontologies. Journal
   of Artificial Intelligence Research 48, 253–303 (2013)
6. Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology bench-
   mark. In: The Semantic Web: Research and Applications, pp. 125–139. Lecture Notes in Com-
   puter Science, Springer (2006)
7. Pan, J.Z., Thomas, E., Zhao, Y.: Completeness guaranteed approximation for OWL-DL
   query answering. In: Proceedings of the 22nd International Workshop on Description Log-
   ics (DL’09). CEUR Workshop Proceedings, vol. 477. CEUR-WS.org (2009), http://dblp.
   uni-trier.de/db/conf/dlog/dlog2009.html#PanTZ09
8. Ren, Y., Pan, J.Z., Zhao, Y.: Soundness preserving approximation for TBox reasoning. In:
   Proceedings of the 25th National Conference on Artificial Intelligence (AAAI’10). AAAI
   Press (2010)
9. Zhou, Y., Nenov, Y., Grau, B.C., Horrocks, I.: Pay-as-you-go OWL query answering using a
   triple store. In: Proceedings of the 28th Conference on Artificial Intelligence (AAAI’14). pp.
   1142–1148 (2014)