=Paper=
{{Paper
|id=Vol-1350/paper-21
|storemode=property
|title=Lower
and Upper Bounds for SPARQL Queries over OWL Ontologies
|pdfUrl=https://ceur-ws.org/Vol-1350/paper-21.pdf
|volume=Vol-1350
|dblpUrl=https://dblp.org/rec/conf/dlog/GlimmKKS15
}}
==Lower
and Upper Bounds for SPARQL Queries over OWL Ontologies==
Lower and Upper Bounds for SPARQL Queries over OWL Ontologies Birte Glimm1 Yevgeny Kazakov1 Ilianna Kollia2 and Giorgos Stamou2 1 Ulm University, Germany,@uni-ulm.de 2 National Technical University of Athens, Greece, ilianna2@mail.ntua.gr, gstam@cs.ntua.gr Introduction The recent standardization of the SPARQL 1.1 Entailment Regimes [2], which extend the SPARQL Query Language [3] with the capability of querying also for implicit knowledge makes the need for an efficient evaluation of complex queries over OWL ontologies urgent. We present an approach for optimizing the evaluation of SPARQL queries over OWL ontologies using SPARQL’s OWL Direct Semantics entailment regi- me. Such queries consist of axiom templates, i.e., Description Logic (DL) axioms with variables in place of concept, role and individual names. Answers to such queries are mappings of query (concept, role or individual) variables to corresponding (concept, role or individual) names that instantiate the axiom templates to axioms entailed by the queried knowledge base (KB). Since computing query answers over an expressive KB is computationally very costly, approximation techniques have been proposed that use a weakened version of the KB to compute a lower bound (yields sound but potentially incomplete results) and a strengthened version to compute an upper bound (yields complete but potentially un- sound results) for the results [9, 7, 8]. Another well-known technique is to compute the bounds from a complete and clash-free tableau generated by a DL reasoner [4, 5]. De- terministically derived facts are used as lower bound, while also non-deterministically derived ones are considered for the upper bound. Answers in the “gap”, i.e., potential answers in the upper but not the lower bound, usually have to be checked individually by performing a consistency check with a fully fledged OWL 2 DL reasoner. While we also use bounds, we allow for much more expressive queries than related approaches. To optimize the evaluation of possible query answers in the gap, we present a query extension approach that uses the TBox of the queried KB to extend the query with additional parts. We show that the resulting query is equivalent to the original one and we use the additional parts that are simple to evaluate for restricting the bounds of subqueries of the initial query. In an empirical evaluation we show that the proposed query extension approach can lead to a significant decrease in the query execution time of up to four orders of magnitude. More details about our method as well as more evaluation results can be found in the extended version of our paper [1]. Improving Bounds via Query Extension We will show the intuition of the proposed query extension method through an example. Let K be a KB, A, B, C be concept names, r a role name, a, b, c, d individual names, x an individual variable, Y a concept variable and T = { B v A t C, ∃r.B v C} A = {A(a), B(b), r(a, b), A(d)} q = {A(x), ∃r.Y(x), Y v B} Note that the only answer for q over K is the mapping {µ | µ(x) = a, µ(Y) = B}. First we compute bounds for axiom templates of q over K. Since {A(a), A(d)} ⊆ K we have K |= A(a) and K |= A(d). That is, we can find the lower bound L = {µ | µ(x) ∈ {a, d}} for the query {A(x)} over K without performing any tests. To find an upper bound for {A(x)}, we can use a model I of K. It is easy to check that K has a model I = (∆I , ·I ) with ∆I = {d1 , d2 , d3 , d4 }, aI = d1 , bI = d2 , cI = d3 , dI = d4 , AI = {d1 , d2 , d4 }, BI = {d2 }, C I = {d1 } and rI = {hd1 , d2 i}. Note that I 6|= A(c). Thus, from this model alone one can conclude that K 6|= A(c) and hence that the set U = {µ | µ(x) ∈ {a, b, d}} provides an upper bound for the query {A(x)} over K. Although the model I can be similarly used for finding an upper bound for complex templates, such as {∃r.Y(x)}, in general it can only be found by iterating over all possible mappings for x and Y and checking which instances of this template are entailed by the model. Therefore, in practice, one does not compute the bounds for complex templates. The bounds for the query {Y v B} can be computed by classifying the KB and retrieving subsumption relationships for B. Since for classification one usually needs to consider just the (relatively small) TBox T , the bounds for this query can be computed exactly, i.e., in our example we have L = U = {µ | µ(Y) ∈ {⊥, B}}. Our query extension method uses additionally the notion of a subquery bound,which provides a range for those answers of a subquery (subset of templates) of q that are sufficient to evaluate q. The intuition of our method is that subquery bounds can be improved using bounds of other subqueries of this query. Thus, if a query q can be extended to an equivalent query q ∪ q0 , the number of reasoner calls performed for evaluating q can be reduced using q0 . The proposed algorithm can be summarized as follows: First, we replace every (concept, role, individual) variable in q with a fresh distinct (concept, role, individual) name. For our example, consider a mapping µ such that µ(x) = a x , µ(Y) = AY with a x an individual and AY a concept name. Then we materialize and classify K plus µ(q), where µ(q) denotes the result of replacing each variable x in q with µ(x), i.e., we compute all concept assertions A(a), role assertions r(a, b), and subsumptions A v B with atomic concepts and roles entailed by K ∪ µ(q). Afterwards, we replace names back with their corresponding variables in the extended KB. In our example T ∪ µ(q) = {B v A t C, ∃r.B v C, A(a x ), ∃r.AY (a x ), AY v B} |= C(a x ). Thus, for q0 = {C(x)}, we have K ∪ µ(q) |= C(a x ) = µ(q0 ), and it holds that the query q has the same answers for K as the extended query q ∪ q0 = {A(x), ∃r.Y(x), Y v B, C(x)}. In the end, we compute query bounds for the templates in q0 and use them to improve the subquery bounds for templates in q. Using again the model I for K, since I |= C(a), but I 6|= C(b), I 6|= C(c) and I 6|= C(d), we can derive the upper bound U = {µ | µ(x) = a} for the query {C(x)}. Using this upper bound, it is now possible to reduce the upper bound for the subquery {A(x)} of q to U. Since U is a subset of the lower bound for {A(x)} (computed in the beginning of the section), this subquery can be evaluated without performing any further entailment tests. The new upper bound U can also be used to further reduce the upper bound for the subquery {∃r.Y(x)} of q to Table 1. Query answering times in seconds (n/a indicates a timeout, > 30 min) and number of performed entailment checks for UOBM with the first department using evalStatic and evalExt evalStatic evalExt UOBM time #entail time #entail q1 = {isAdvisedBy(x, y), GraduateStudent(x), Woman(y)}, 20.84 47 10.54 19 q01 = {Professor(y)} q2 = {isTaughtBy(x, y), GraduateCourse(x), Woman(y)}, 21.63 51 12.06 26 q02 = {Faculty(y)} q3 = {teachingAssistantOf(x, y), GraduateCourse(y), Woman(x)}, 12.78 32 5.60 12 q03 = {TeachingAssistant(x)} q4 = {∃worksFor.Organization(x), Woman(x)}, n/a 18.36 135 q04 = {Employee(x)} q5 = {X v ∃isHeadOf.Department, ∃isTaughtBy.X(y)}, n/a 1.99 4 q05 = {X v Chair, CourseTaughtByChair(y)} q6 = {X v ∃isHeadOf.College, ∃isAdvisedBy.X(y), n/a 0.21 0 q06 = {X v Dean, PersonAdvisedByDean(y)} {µ | µ(x) = a, µ(Y) ∈ {⊥, B}}. After this reduction, this subquery can be evaluated using just two entailment tests. Evaluation The proposed method has been implemented and evaluated over a set of well-known benchmark ontologies and relevant datasets and for several forms of queries. Although it can be used, in general, for improving the performance of most query answering systems based on query bounds, here the evaluation is based on the system described in Kollia et al. [5], which, to the best of our knowledge, is the only system that supports the evaluation of complex queries over OWL 2 DL ontologies under the OWL Direct Semantics entailment regime of SPARQL 1.1. In our implemented method (referred to as evalExt) we improve the subquery bounds computed in evalStatic [5] using the method described in the previous section. Afterwards, we perform the ordering and evaluation methods of Kollia et al. using the improved subquery bounds. In Table 1 we show the results of the evaluation on the University Ontology Bench- mark (UOBM) [6] using a range of custom queries since the queries provided for UOBM are only simple conjunctive instance queries. Column 1 shows the query qi and extension templates q0i (1 ≤ i ≤ 6), columns 2 and 3 show the query answering times and the number of performed entailment checks for evalStatic, respectively, and columns 4 and 5 show the respective numbers for evalExt. In all queries the time spent for query extension is negligible compared to the time spent for query evaluation. We observe that for all queries, additional extension templates were derived, which have significantly better query bounds than the complex templates of the queries. This di- rectly translates to a significantly lower number of entailment checks for evalExt and, hence, a reduction in execution times. The reduction in query answering times is up to four orders of magnitude. References 1. Glimm, B., Kazakov, Y., Kollia, I., Stamou, G.: Lower and upper bounds for SPARQL queries over OWL ontologies. In: Proceedings of the 29th Conference on Artificial Intelli- gence (AAAI’15) (2015) 2. Glimm, B., Ogbuji, C. (eds.): SPARQL 1.1 Entailment Regimes. W3C Recommendation (2013), available at http://www.w3.org/TR/sparql11-entailment/ 3. Harris, S., Seaborne, A. (eds.): SPARQL 1.1 Query Language. W3C Recommendation (2013), available at http://www.w3.org/TR/sparql11-query/ 4. Kollia, I., Glimm, B.: Cost based query ordering over OWL ontologies. In: Proceedings of the 11th International Semantic Web Conference (ISWC 2012). Lecture Notes in Computer Science, vol. 7649, pp. 231–246. Springer (2012) 5. Kollia, I., Glimm, B.: Optimizing SPARQL Query Answering over OWL Ontologies. Journal of Artificial Intelligence Research 48, 253–303 (2013) 6. Ma, L., Yang, Y., Qiu, Z., Xie, G., Pan, Y., Liu, S.: Towards a complete OWL ontology bench- mark. In: The Semantic Web: Research and Applications, pp. 125–139. Lecture Notes in Com- puter Science, Springer (2006) 7. Pan, J.Z., Thomas, E., Zhao, Y.: Completeness guaranteed approximation for OWL-DL query answering. In: Proceedings of the 22nd International Workshop on Description Log- ics (DL’09). CEUR Workshop Proceedings, vol. 477. CEUR-WS.org (2009), http://dblp. uni-trier.de/db/conf/dlog/dlog2009.html#PanTZ09 8. Ren, Y., Pan, J.Z., Zhao, Y.: Soundness preserving approximation for TBox reasoning. In: Proceedings of the 25th National Conference on Artificial Intelligence (AAAI’10). AAAI Press (2010) 9. Zhou, Y., Nenov, Y., Grau, B.C., Horrocks, I.: Pay-as-you-go OWL query answering using a triple store. In: Proceedings of the 28th Conference on Artificial Intelligence (AAAI’14). pp. 1142–1148 (2014)