OBDA Using RL Reasoners and Repairing?

                                  Giorgos Stoilos

                   School of Electrical and Computer Engineering
                   National Technical University of Athens, Greece


1     Introduction
Rewriting the input TBox T (and query Q) into a datalog program, called T -
rewriting ((Q,T )-rewriting), is a prominent approach to ontology-based data
access [1]. It was used as early as the KAON2 system [5] and it currently con-
sists of (perhaps) the standard approach to answering queries over ontologies
expressed in the languages DL-Lite [2, 12] and ELHI [8, 16].
    Apart from computing a rewriting, a problem that has been proven very diffi-
cult is how to subsequently (effectively) evaluate it over the data. In many cases,
due to its size and/or complexity, additional techniques need to be devised [11,
9, 10]. Unfortunately, all of them have so far been applied only on DL-Lite while
for more expressive languages, apart from the work in [3], to the best of our
knowledge, no other integrated query answering system has been reported or
evaluated with large and complex ontologies.
    An interesting observation is that T -rewritings are usually of a particular
form—that is, they can be translated back into an RL TBox [14].1 Hence, scal-
able RL system, like OWLim, can be used to evaluate them over the data. More
precisely, it was shown how, given a SROIQ TBox T and an RL system ans, a
rewriting of T can be used to compute a set of axioms R (called repair ) such that
for every CQ Q with only distinguished variables (i.e., SPARQL CQs) and every
ABox A we have cert(Q, T ∪ A) ⊆ ans(Q, T ∪ R ∪ A). That is, after repairing,
ans is indistinguishable from a SROIQ reasoner w.r.t. SPARQL CQs.
    Although, the experimental results in [14] provided with encouraging results,
these were quite preliminary. First, the implementation used an arguably ob-
solete system (Requiem) and had no optimisations. Second, the evaluation was
based on LUBM (an arguably trivial TBox) and a small fragment of Galen.
Hence, it was unclear whether repairing can be applied to large and complex
TBoxes. Third, the approach could only handle SPARQL CQs.
    In the current paper we attempt to extensively study repairing as an approach
to OBDA over expressive TBoxes. First, we propose several optimisations and
refinements to the first prototype in order to handle large and complex TBoxes.
Second, we show how arbitrary queries can be supported. Third, we perform an
extensive experimental evaluation which showed that we can handle very large
and complex TBoxes (e.g., Galen, GO, Fly) and answer arbitrary CQs over one
of them (Fly) within milliseconds. This is an extended abstract of paper [13].
?
    Funded by an EU FP7 Marie Curie Career Reintegration Grant (No.: 303914).
1
    In OWL and Semantic Web jargon that is an OWL 2 RL TBox [4].
2     Computing Repairs Efficiently
Let T be a TBox and let ans be an RL system. Roughly speaking, a repair R
of ans for T is computed by first computing a T -rewriting (Rew) for T , then
removing from Rew the parts that ans can already handle, and finally, minimising
the resulting set. For example, if T = {A v ∃R, ∃R v B}, then Rew = {A(x) →
B(x), R(x, y) → B(x)} is a T -rewriting of T , while R = {A v B} is the desirable
repair for ans. We call Repair(T ) the overall procedure.
    Compared to [14] we have performed two important refinements. First, ac-
cording to [14] a repair R can only be computed by T -rewritings Rew such
that T |= Rew. This is quite restrictive as it prohibits the use of many efficient
rewriting algorithms which normalise the input TBox T into T 0 by introducing
fresh symbols. To be able to use such approaches to compute repairs we notice
that, for T 0 the normalised TBox used internally by the rewriting system to
compute Rew, we have cert(Q, T ∪ A) ⊆ ans(Q, T 0 ∪ Repair(T 0 ) ∪ A). Second,
the minimisations steps applied by Repair are based on expensive FOR-loops
over the initial T -rewriting in which HermiT is invoked. Despite how optimised
a SROIQ reasoner is, in large ontologies, these entailment checks could simply
be too many. To improve the behaviour we have interveaned in the internals of
HermiT to implement a form of incremental entailment checking. First, we ex-
haustively apply the calculus over T to construct an initial model and we mark
the completion of the execution. Then, to check T |= α we resume the execution
from the previous point and after completion we backtrack to the marked point.

3     Supporting non-SPARQL Queries
Let Q be an arbitrary CQ with query predicate Q, let T be a TBox, and let
RewD ] RewQ be a (Q, T )-rewriting (RewD is a datalog program not mentioning
Q and RewQ a UCQ that mentions Q). Since RewD does not mention Q it
captures only ground entialments of T over some A. But, after repairing, ans
is complete w.r.t. all ground entailments for any A; hence cert(Q, T ∪ A) ⊆
ans(RewQ , T ∪ Repair(T ) ∪ A). The following summarises our approach:
 1. Compute a repair R of T for ans using procedure Repair.
 2. Load the dataset A, the input TBox T , and the repair R to ans.
 3. For a CQ Q, if Q is SPARQL then evaluated it over ans; otherwise compute
    a (Q, T )-rewriting RewD ] RewQ and evaluate RewQ over ans.
Note that most steps, i.e., steps 1 and 2, can be done only once as a pre-processing
(changes in A can be handled incrementally).

4     Implementation and Evaluation
We have implemented a prototype ontology repair and query answering tool
called Hydrowl.2 The tool uses Rapid [15] to compute T -rewritings and Her-
miT [7] and OWLim [6] to minimise it into the final repair.
2
    http://www.image.ece.ntua.gr/~gstoil/hydrowl/
Table 1. |T | (|R|): number of axioms of the input TBox (repair), t: time in seconds.

       T         |T |         |R|          t         T       |T |       |R|   t
   Not-Galen 5471           3015(4153)   298(42) Galen-doc 4229 6051(6176) 1152(28)
      Fly       19845 10361(12368) 2884(178)       Galen    4229 3012(3062) 257(24)

           Table 2. Loading times for Fly and UOBM for the various ABoxes.

                Universities                           A 2×A 3×A 4×A 5×A
           1 2 5 10 20                         Fly    14.0 21.9 22.7 27.9 31.5
   UOBM   4.1 6.8 16.2 31.9 73.2               Fly∪R 31.9 55.1 68.5 93.0 119.3
   UOBM∪R 4.4 8.3 24.3 44.9 108.1              Fly∪R− 33.2 62.1 70.1 100.6 118.2
                (a) UOBM                                      (b) Fly


    Using Hydrowl we managed to compute repairs for 151 out of the 152 ontolo-
gies of our dataset, which contained some large and highly complex ones. In the
vast majority of cases a repair could be computed in less than a second, only a
handfull required up to a few minutes, and only the very large ones several min-
utes; Table 1 presents results for the latter. Despite their size and complexity we
see that we can compute repairs for them in a reasonable amount of time (recall
this is usually done only once). Actually, if we discard a very expensive minimisa-
tion step, then we can compute some (non-minimal) repair very efficiently while
its size is not considerably larger than the minimal one (see Table 1 numbers in
parenthesis). The system in [14] could not handle any of these ontologies.
    Next, Table 2 presents loading experiments to OWLim using UOBM (1 to
20 universities) and Fly (ABox multiplied up to 5 times). As can be seen, the
overhead introduced by repairing is significantly noticeable only in Fly, mostly
due to the size of the computed repair (R), however, recall that loading is usually
performed once. In Fly we have also loaded the non-minimal repair (R− ) and
as it turns out there is no significant difference compared to the minimal one.


                        Table 3. Results for Answering the Fly Queries

              Q1                Q2                Q4                Q5
       |RewQ | trew tans |RewQ | trew tans |RewQ | trew tans |RewQ | trew tans
           64 0.31 0.31    2880 0.90 1.28      91 0.07 0.04       6 0.05 0.02


   The Fly ontology comes with 4 non-SPARQL queries. Table 3 presents the
results of our technique from Section 3. As we can see in most cases we were
able to compute and evaluate a rewriting almost instantaneously. This greatly
outperformes previous approaches on Fly [17, 18] which require several seconds
per query. The good behaviour of our approach can be attributed to the fact
that most hard work is pushed to a pre-processing step while RewQ , the only
thing computed on-line, is usually small and of simple structure.
References

 1. Diego Calvanese De Giacomo Giuseppe Antonella Poggi, Domenico Lembo, Mau-
    rizio Lenzerini, and Riccardo Rosati. Linking data to ontologies. Journal on Data
    Semantics, X:133–173, 2008.
 2. Diego Calvanese, Giuseppe De Giacomo, Domenico Lembo, Maurizio Lenzerini,
    and Riccardo Rosati. Tractable reasoning and efficient query answering in descrip-
    tion logics: The DL-Lite family. Journal of Automated Reasoning, 39(3):385–429,
    2007.
 3. Thomas Eiter, Magdalena Ortiz, Mantas Simkus, Trung-Kien Tran, and Guohui
    Xiao. Query rewriting for Horn-SHIQ plus rules. In Proc. of AAAI, 2012.
 4. Benjamin N. Grosof, Ian Horrocks, Raphael Volz, and Stefan Decker. Description
    logic programs: Combining logic programs with description logic. In Proceedings
    of the Twelfth International World Wide Web Conference (WWW 2003), pages
    48–57. ACM, 2003.
 5. Ullrich Hustadt, Boris Motik, and Ulrike Sattler. Deciding expressive description
    logics in the framework of resolution. Information & Computation, 206(5):579–601,
    2008.
 6. Atanas Kiryakov, Barry Bishoa, Damyan Ognyanoff, Ivan Peikov, Zdravko Tashev,
    and Ruslan Velkov. The Features of BigOWLIM that Enabled the BBCs World
    Cup Website. In Workshop on Semantic Data Management (SemData), 2010.
 7. Boris Motik, Rob Shearer, and Ian Horrocks. Hypertableau Reasoning for Descrip-
    tion Logics. Journal of Artificial Intelligence Research, 36:165–228, 2009.
 8. Héctor Pérez-Urbina, Boris Motik, and Ian Horrocks. Tractable query answer-
    ing and rewriting under description logic constraints. Journal of Applied Logic,
    8(2):186–209, 2010.
 9. Mariano Rodriguez-Muro and Diego Calvanese. High performance query answering
    over DL-Lite ontologies. In Proceedings of the Thirteenth International Conference
    on Principles of Knowledge Representation and Reasoning (KR 2012), 2012.
10. Mariano Rodrı́guez-Muro, Roman Kontchakov, and Michael Zakharyaschev.
    Ontology-based data access: Ontop of databases. In Proceedings of the Interna-
    tional Semantic Web Conference (ISWC 2013), pages 558–573, 2013.
11. Riccardo Rosati. Prexto: Query rewriting under extensional constraints in dl-lite.
    In 9th Extended Semantic Web Conference (ESWC 2012), pages 360–374, 2012.
12. Riccardo Rosati and Alessandro Almatelli. Improving query answering over DL-
    Lite ontologies. In Proceedings of the 12th International Conference on the Prin-
    ciples of Knowledge Representation and Reasoning (KR-10), 2010.
13. Giorgos Stoilos. Ontology-based data access using rewriting, OWL 2 RL systems
    and repairing. In Proceedings of the 11th European Semantic Web Conference
    (ESWC 2014), 2014.
14. Giorgos Stoilos, Bernardo Cuenca Grau, Boris Motik, and Ian Horrocks. Repair-
    ing ontologies for incomplete reasoners. In Proceedings of the 10th International
    Semantic Web Conference (ISWC-11), Bonn, Germany, pages 681–696, 2011.
15. Depoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou. Op-
    timising resolution-based rewriting algorithms for DL ontologies. In Proceedings of
    the 26th Workshop on Description Logics (DL 2013), 2013.
16. Despoina Trivela, Giorgos Stoilos, Alexandros Chortaras, and Giorgos Stamou.
    Optimising resolution-based rewriting algorithms for dl ontologies. In Proceedings
    of the 26th Workshop on Description Logics (DL 2013), Ulm, Germany, 2013.
17. Yujiao Zhou, Bernardo Cuenca Grau, Ian Horrocks, Zhe Wu, and Jay Banerjee.
    Making the most of your triple store: Query answering in OWL 2 using an RL
    reasoner. In Proc, WWW 2013, pages 1569–1580, 2013.
18. Yujiao Zhou, Yavor Nenov, Bernardo Cuenca Grau, and Ian Horrocks. Complete
    query answering over horn ontologies using a triple store. In Proc. of the 12th
    International Semantic Web Conference (ISWC). Springer LNCS, 2013.