Introduction

Hybrid Query Answering Over Expressive DL Ontologies

Giorgos Stoilos

0 1

Giorgos Stamou

0 1 0 Checking (In)Completeness of Systems 1 School of Electrical and Computer Engineering , NTUA , Greece

The design of tractable Description Logics, like E L [1], DL-Lite [2], and RL [5] have spurred the design and implementation of highly scalable systems such as OWLim, Oracle's Semantic Graph, and more. The attractive properties of the these systems have in many occasions led application developers to use them even in cases where the input ontology is expressed in the far more expressive SROIQ language. Although, in these cases there can be combinations of inputs for which these systems will fail to compute all certain answers, nevertheless, this notion of completeness is too coarse and these might still be able to compute the correct answers for many input queries. In the current paper, we report on the following problem: given a query Q containing only distinguished variables (i.e., a SPARQL query) over a SROIQ TBox T and a (scalable) system ans complete for some fragment L of SROIQ, is it possible to identify in an e cient way if ans is complete for Q; T ? We show that this is possible if one has pre-computed the set (U ) of atomic queries for which ans is known to be complete. Then, using these techniques we develop an approach to query answering which can decide at run-time whether Q can be evaluated using ans or a fully- edged SROIQ reasoner needs to be employed together with ans to compute the right answers. We have instantiated our framework using the well-known RL system OWLim and the SROIQ reasoner HermiT. Our experimental evaluation has shown that for most test queries over LUBM and UOBM we can correctly determine if OWLim is (in)complete in less than 50 millisecond, while there are only 3 queries for which our tool replied \unknown". Finally, our hybrid query answering system was able to compute all answers to the test queries within milliseconds. Compared to previous techniques that attempt to deliver scalable query answering over SROIQ TBoxes by using RL systems [11, 8], our approach has the advantage that the set U always exists regardless of the expressivity of T , hence it is readily applicable to arbitrary SROIQ ontologies. Moreover, the framework is highly modular|that is, it can be instantiated using any combination of systems and not only RL ones. This is an extended abstract of the paper [9].

Introduction

Consider the TBox T = fA v 9S; 9S v B; D v Bg and a system ans that is complete for RL fragment of SHOIQ [ 5 ]. Then, the behaviour of ans can be characterised by the TBox T jRL = f9S v B; D v Bg since ans discards axioms with existentials in the RHS. Clearly, ans is complete for the atomic queries Q1 = S(x; y) and Q2 = D(x) and, moreover, it is clearly complete for the query Q = S(x; y) ^ D(x).

The completeness of ans w.r.t. Q; T is rather expected given the fact that Q is formed by atoms S(x; y) and D(x) and ans is complete for queries Q1 and Q2 which correspond to these atoms. This suggests that given a set of atomic queries over which ans is known to be complete, called query base (QB) U of ans for T , then we can deduce the completeness of ans w.r.t. an arbitrary SPARQL query Q by checking if for each of its atoms there is an isomorphic query in U . Actually, as shown next, it su ces for an atom to \appear" in U or be \entailed" by other atoms of Q that appear in U .

Theorem 1. Let T be a SROIQ-TBox, let ans be a system complete for a DL L which over T is characterised by the TBox T jL, let U be a QB of ans for T , and let Q be a SPARQL query. Let also B be all atoms of Q for which an isomorphic query in U exists. If each atom in Q is either in B or T jL j= V B ! , then ans is (Q; T )-complete.

However, the previous provides only a su cient condition for completeness (cf. [9, Example 7]). Unfortunately, to devise both a su cient and necessary condition one most likely needs to quantify over all minimal \representative" ABoxes A such that ans would not compute all answers over T [ A and this set can be in nite [ 3 ]. Hence, there are (perhaps) strong theoretical limitations in devising such a condition.

To alleviate this issue we have also devised a su cient (easy to check) syntactic condition for concluding incompleteness. This condition is based on the notion of reachability between the symbols of a TBox T [ 10, 7 ]. Note that Q needs to additionally satisfy a certain consistency Property (]) [ 9 ]. Theorem 2. Let LH be a Horn DL and let ans be a system whose behaviour over T is characterised by T jLH . Moreover, let T ; U ; B be as in Theorem 1, and let Q be a SPARQL CQ satisfying Property (]) [ 9 ]. If an atom of Q is not Sig(B)-reachable in T jLH , then ans is (Q; T )-incomplete.

Clearly, there can be CQs for which neither of the above theorems hold and hence for which we cannot determine if ans is complete or not.

Constructing QBs In Practice An interesting feature of QBs is that they always exist (a system ans is either complete or not for an atomic query and for a given TBox T there is only a nite number of them). However, a central issue is how to construct the QB in an easy (i.e., automatic) way. By recent results [ 3 ] and developed software1 this is to a large extent possible and, although there are some negative results regarding the techniques in [ 3, 4 ], we strongly believe that even in the theoretically problematic cases QBs can be computed e ectively (cf. [9, Section 5]). 1 https://code.google.com/p/sygenia/

Hybrid Query Answering

The straightforward way to exploit our techniques is to use them at query time to decide if a given query Q can be evaluated using some scalable system ans or we need to resort to a SROIQ reasoner. However, in queries with only distinguished variables we can provide an even more ne-grained approach. More precisely, Q can be split into the part Qc for which ans is known to be complete (determined using the previous techniques) and the part Qi that is not. Then, we can evaluate Qc using ans, Qi using a SROIQ reasoner and nally join the results. Evan more, ans can also be used to further improve the performance of the second step as follows: rst, the evaluation of Qc using ans provides an upper bound of the answers (since the contraints in the Qi part are missing) and, second, we can also evaluate Qi using ans to quickly provide with a lower bound. These two bounds are passed to the SROIQ reasoner in order to restrict the search to only those individuals that are in the upper and not in the lower bound. For the detailed algorithm the reader is referred to [ 9 ]. 4

Evaluation

We have instantiated our framework using the well-known scalable RL system OWLim and the SROIQ reasoner HermiT. The implemented system is called Hydrowl and is available at http://www.image.ece.ntua.gr/~gstoil/ hydrowl.

First, at a pre-processing step, we used Hydrowl to compute a QB for OWLim for ontologies LUBM and UOBM. For LUBM we required 14.5 seconds while for UOBM we required 48.7 seconds (some manual editing was required for UOBM). Second, we used Hydrowl to check if OWLim is complete for all the test queries of LUBM and UOBM. When all the atoms of a test query appear in the set B (cf. Theorem 1) our tool replies \complete" almost instantaneously (less than 5ms). However, even when for some atom we tested T jL j= V B ! , the tool still required less than 50 milliseconds. For LUBM we were able to correctly identify (in)completeness of OWLim for all except for two queries (Hydrowl replied \unknown"), wihle for UOBM for all except just one. It is worth noting that for the unknown cases OWLim is actually incomplete.

Fnially, we used Hydrowl to evaluate all test queries of LUBM using 5 universities and of UOBM using 1 department and we have compared against the HermiT-BGP system [ 6 ]. Table 1 presents the results (in seconds) for all the interesting queries (for the rest both systems have similar response times). Grey coloured columns mark those queries where Hydrowl uses both OWLim and HermiT; in all other cases only OWLim is used. As can be seen, in all queries Hydrowl is considerably faster than HermiT-BGP.

Acknowledgements Giorgos Stoilos was funded by a Marie Curie Career Reintegration Grant within EU's FP7 under grant agreement 303914. Query

LUBM 1 3 8 9

Franz

Baader ,

Sebastian

Brandt , and

Carsten

Lutz . Pushing the EL envelope . In Proceedings of the 19th International Joint Conference on Arti cial Intelligence (IJCAI-05) , pages 364 { 369 , 2005 .

2. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini, and

Riccardo

Rosati . Tractable reasoning and e cient query answering in description logics: The DL-Lite family . J. of Automated Reasoning , 39 ( 3 ): 385 { 429 , 2007 .

Bernardo

Cuenca Grau , Boris Motik, Giorgos Stoilos, and

Ian

Horrocks . Completeness guarantees for incomplete ontology reasoners: Theory and practice . Journal of Arti cial Intelligence Research , 43 : 419 { 476 , 2012 .

Bernardo

Cuenca Grau , Boris Motik, Giorgos Stoilos, and

Ian

Horrocks . Computing datalog rewritings beyond horn ontologies . In Proceedings of the Twenty-Third International Joint Conference on Arti cial Intelligence (IJCAI 2013 ), 2013 .

5. Benjamin

Grosof , Ian Horrocks, Raphael Volz, and Stefan

Decker . Description logic programs: Combining logic programs with description logic . In Proceedings of the Twelfth International World Wide Web Conference (WWW 2003 ), pages 48 { 57 . ACM, 2003 .

Ilianna

Kollia and

Birte

Glimm . Optimizing sparql query answering over owl ontologies . Journal of Arti cial Intelligence Research (JAIR) , 48 : 253 { 303 , 2013 .

Riku

Nortje , Katarina Britz, and Thomas Meyer. Reachability modules for the description logic SRIQ . In Proceedings of the International Conference on Logic for Programming Arti cial Intelligence and Reasoning (LPAR 19) , 2013 .

Giorgos

Stoilos . Ontology-based data access using rewriting, OWL 2 RL systems and repairing . In Proceedings of the 11th European Semantic Web Conference (ESWC 2014 ), 2014 .

Giorgos

Stoilos and

Giorgos

Stamou . Hybrid query answering for owl ontologies . Proceedings of the 21st European Conference on Arti cial Intelligence (ECAI 14) , 2014 . available at: http://www.image.ece.ntua.gr/~gstoil/temp/main.pdf.

10.

Boontawee

Suntisrivaraporn . Module extraction and incremental classi cation: A pragmatic approach for ontologies . In Proceedings of European Semantic Web Conference (ESWC 2008 ), pages 230 { 244 , 2008 .

11. Yujiao

Zhou

, Yavor Nenov, Bernardo Cuenca Grau, and

Ian

Horrocks . Complete query answering over horn ontologies using a triple store . In Proceedings of the 12th International Semantic Web Conference (ISWC 2013 ), 2013 .