Hybrid Query Answering Over Expressive DL
                   Ontologies

                      Giorgos Stoilos and Giorgos Stamou

          School of Electrical and Computer Engineering, NTUA, Greece


1    Introduction
The design of tractable Description Logics, like EL [1], DL-Lite [2], and RL [5]
have spurred the design and implementation of highly scalable systems such as
OWLim, Oracle’s Semantic Graph, and more. The attractive properties of the
these systems have in many occasions led application developers to use them
even in cases where the input ontology is expressed in the far more expressive
SROIQ language. Although, in these cases there can be combinations of inputs
for which these systems will fail to compute all certain answers, nevertheless, this
notion of completeness is too coarse and these might still be able to compute
the correct answers for many input queries.
     In the current paper, we report on the following problem: given a query Q
containing only distinguished variables (i.e., a SPARQL query) over a SROIQ
TBox T and a (scalable) system ans complete for some fragment L of SROIQ,
is it possible to identify in an efficient way if ans is complete for Q, T ? We show
that this is possible if one has pre-computed the set (U) of atomic queries for
which ans is known to be complete. Then, using these techniques we develop an
approach to query answering which can decide at run-time whether Q can be
evaluated using ans or a fully-fledged SROIQ reasoner needs to be employed
together with ans to compute the right answers.
     We have instantiated our framework using the well-known RL system OWLim
and the SROIQ reasoner HermiT. Our experimental evaluation has shown that
for most test queries over LUBM and UOBM we can correctly determine if
OWLim is (in)complete in less than 50 millisecond, while there are only 3 queries
for which our tool replied “unknown”. Finally, our hybrid query answering sys-
tem was able to compute all answers to the test queries within milliseconds.
Compared to previous techniques that attempt to deliver scalable query answer-
ing over SROIQ TBoxes by using RL systems [11, 8], our approach has the
advantage that the set U always exists regardless of the expressivity of T , hence
it is readily applicable to arbitrary SROIQ ontologies. Moreover, the frame-
work is highly modular—that is, it can be instantiated using any combination
of systems and not only RL ones. This is an extended abstract of the paper [9].

2    Checking (In)Completeness of Systems
Consider the TBox T = {A v ∃S, ∃S v B, D v B} and a system ans that is
complete for RL fragment of SHOIQ [5]. Then, the behaviour of ans can be
characterised by the TBox T |RL = {∃S v B, D v B} since ans discards axioms
with existentials in the RHS. Clearly, ans is complete for the atomic queries
Q1 = S(x, y) and Q2 = D(x) and, moreover, it is clearly complete for the query
Q = S(x, y) ∧ D(x).
    The completeness of ans w.r.t. Q, T is rather expected given the fact that
Q is formed by atoms S(x, y) and D(x) and ans is complete for queries Q1 and
Q2 which correspond to these atoms. This suggests that given a set of atomic
queries over which ans is known to be complete, called query base (QB) U of ans
for T , then we can deduce the completeness of ans w.r.t. an arbitrary SPARQL
query Q by checking if for each of its atoms there is an isomorphic query in U.
Actually, as shown next, it suffices for an atom to “appear” in U or be “entailed”
by other atoms of Q that appear in U.
Theorem 1. Let T be a SROIQ-TBox, let ans be a system complete for a DL
L which over T is characterised by the TBox T |L , let U be a QB of ans for T , and
let Q be a SPARQL query. Let also B be all atoms of Q for whichVan isomorphic
query in U exists. If each atom α in Q is either in B or T |L |= B → α, then
ans is (Q, T )-complete.
    However, the previous provides only a sufficient condition for completeness
(cf. [9, Example 7]). Unfortunately, to devise both a sufficient and necessary
condition one most likely needs to quantify over all minimal “representative”
ABoxes A such that ans would not compute all answers over T ∪ A and this set
can be infinite [3]. Hence, there are (perhaps) strong theoretical limitations in
devising such a condition.
    To alleviate this issue we have also devised a sufficient (easy to check) syn-
tactic condition for concluding incompleteness. This condition is based on the
notion of reachability between the symbols of a TBox T [10, 7]. Note that Q
needs to additionally satisfy a certain consistency Property (]) [9].
Theorem 2. Let LH be a Horn DL and let ans be a system whose behaviour
over T is characterised by T |LH . Moreover, let T , U, B be as in Theorem 1, and
let Q be a SPARQL CQ satisfying Property (]) [9]. If an atom α of Q is not
Sig(B)-reachable in T |LH , then ans is (Q, T )-incomplete.
Clearly, there can be CQs for which neither of the above theorems hold and
hence for which we cannot determine if ans is complete or not.

Constructing QBs In Practice An interesting feature of QBs is that they
always exist (a system ans is either complete or not for an atomic query and for
a given TBox T there is only a finite number of them). However, a central issue
is how to construct the QB in an easy (i.e., automatic) way. By recent results [3]
and developed software1 this is to a large extent possible and, although there are
some negative results regarding the techniques in [3, 4], we strongly believe that
even in the theoretically problematic cases QBs can be computed effectively (cf.
[9, Section 5]).
1
    https://code.google.com/p/sygenia/
3   Hybrid Query Answering

The straightforward way to exploit our techniques is to use them at query time to
decide if a given query Q can be evaluated using some scalable system ans or we
need to resort to a SROIQ reasoner. However, in queries with only distinguished
variables we can provide an even more fine-grained approach. More precisely, Q
can be split into the part Qc for which ans is known to be complete (determined
using the previous techniques) and the part Qi that is not. Then, we can evaluate
Qc using ans, Qi using a SROIQ reasoner and finally join the results. Evan
more, ans can also be used to further improve the performance of the second
step as follows: first, the evaluation of Qc using ans provides an upper bound
of the answers (since the contraints in the Qi part are missing) and, second, we
can also evaluate Qi using ans to quickly provide with a lower bound. These
two bounds are passed to the SROIQ reasoner in order to restrict the search
to only those individuals that are in the upper and not in the lower bound. For
the detailed algorithm the reader is referred to [9].


4   Evaluation

We have instantiated our framework using the well-known scalable RL sys-
tem OWLim and the SROIQ reasoner HermiT. The implemented system is
called Hydrowl and is available at http://www.image.ece.ntua.gr/~gstoil/
hydrowl.
    First, at a pre-processing step, we used Hydrowl to compute a QB for OWLim
for ontologies LUBM and UOBM. For LUBM we required 14.5 seconds while for
UOBM we required 48.7 seconds (some manual editing was required for UOBM).
Second, we used Hydrowl to check if OWLim is complete for all the test queries
of LUBM and UOBM. When all the atoms of a test query appear in the set B
(cf. Theorem 1) our tool replies “complete” almost instantaneously V (less than
5ms). However, even when for some atom α we tested T |L |= B → α, the
tool still required less than 50 milliseconds. For LUBM we were able to correctly
identify (in)completeness of OWLim for all except for two queries (Hydrowl
replied “unknown”), wihle for UOBM for all except just one. It is worth noting
that for the unknown cases OWLim is actually incomplete.
    Fnially, we used Hydrowl to evaluate all test queries of LUBM using 5 uni-
versities and of UOBM using 1 department and we have compared against the
HermiT-BGP system [6]. Table 1 presents the results (in seconds) for all the
interesting queries (for the rest both systems have similar response times). Grey
coloured columns mark those queries where Hydrowl uses both OWLim and
HermiT; in all other cases only OWLim is used. As can be seen, in all queries
Hydrowl is considerably faster than HermiT-BGP.


Acknowledgements Giorgos Stoilos was funded by a Marie Curie Career Rein-
tegration Grant within EU’s FP7 under grant agreement 303914.
                           Table 1. Query Answering Times

                                   LUBM               UOBM
                   Query       1    3 8 9     3   4    9 11 12 14
               HermiT-BGP 2.5 1.4 1.4 105 204 5.8 21.6 1.7 1.2 48.7
                 Hydrowl  .07 .07 .24 .13 .02 .01 .01 .07 .04 35.3


References
 1. Franz Baader, Sebastian Brandt, and Carsten Lutz. Pushing the EL envelope. In
    Proceedings of the 19th International Joint Conference on Artificial Intelligence
    (IJCAI-05), pages 364–369, 2005.
 2. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    and Riccardo Rosati. Tractable reasoning and efficient query answering in de-
    scription logics: The DL-Lite family. J. of Automated Reasoning, 39(3):385–429,
    2007.
 3. Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, and Ian Horrocks. Complete-
    ness guarantees for incomplete ontology reasoners: Theory and practice. Journal
    of Artificial Intelligence Research, 43:419–476, 2012.
 4. Bernardo Cuenca Grau, Boris Motik, Giorgos Stoilos, and Ian Horrocks. Comput-
    ing datalog rewritings beyond horn ontologies. In Proceedings of the Twenty-Third
    International Joint Conference on Artificial Intelligence (IJCAI 2013), 2013.
 5. Benjamin N. Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description
    logic programs: Combining logic programs with description logic. In Proceedings
    of the Twelfth International World Wide Web Conference (WWW 2003), pages
    48–57. ACM, 2003.
 6. Ilianna Kollia and Birte Glimm. Optimizing sparql query answering over owl
    ontologies. Journal of Artificial Intelligence Research (JAIR), 48:253–303, 2013.
 7. Riku Nortje, Katarina Britz, and Thomas Meyer. Reachability modules for the
    description logic SRIQ. In Proceedings of the International Conference on Logic
    for Programming Artificial Intelligence and Reasoning (LPAR 19), 2013.
 8. Giorgos Stoilos. Ontology-based data access using rewriting, OWL 2 RL systems
    and repairing. In Proceedings of the 11th European Semantic Web Conference
    (ESWC 2014), 2014.
 9. Giorgos Stoilos and Giorgos Stamou. Hybrid query answering for owl ontologies.
    Proceedings of the 21st European Conference on Artificial Intelligence (ECAI 14),
    2014. available at: http://www.image.ece.ntua.gr/~gstoil/temp/main.pdf.
10. Boontawee Suntisrivaraporn. Module extraction and incremental classification:
    A pragmatic approach for ontologies. In Proceedings of European Semantic Web
    Conference (ESWC 2008), pages 230–244, 2008.
11. Yujiao Zhou, Yavor Nenov, Bernardo Cuenca Grau, and Ian Horrocks. Complete
    query answering over horn ontologies using a triple store. In Proceedings of the
    12th International Semantic Web Conference (ISWC 2013), 2013.