Introduction

cient Query Answering in DL-Lite through FOL Reformulation (Extended Abstract)

Damian Bursztyn

Francois Goasdoue

Ioana Manolescu

INRIA

U. Paris-Sud

France

0 U. Rennes 1 & INRIA , France

We propose a general query optimization framework for formalisms enjoying FOL reducibility of query answering, for which it reduces to the evaluation of a FOL query against facts. This framework allows searching within a set of alternative equivalent FOL queries, i.e., FOL reformulations, one with minimal evaluation cost when evaluated through a relational database management system. We provide two algorithms, an exhaustive and a greedy, for exploring the optimization space. This framework is applied to the lightweight description logic DL-LiteR underpinning the W3C's OWL2 QL pro le, for which an experimental evaluation validates the interest and applicability of our technique.

Introduction Cover-based query answering optimization

RDBMS query optimizers consider a set of evaluation alternatives (a.k.a. logical and physical plans), and select the one minimizing a cost estimation function. Since the number of alternatives is in O(2n n!) for a conjunctive query (CQ) of n atoms [ 4 ], modern optimizers rely on heuristics to explore only a few alternatives; this works (very) well for small-to-moderate size CQs. However, FOL reformulations go beyond CQs in general, and may be extremely large, leading the RDBMS to perform poorly.

To work around this limitation, we introduce the cover-based query answering technique to de ne a space of equivalent FOL reformulations of a CQ. A cover de nes how the query is split into subqueries, that may overlap, called fragment queries, such that substituting each subquery with its FOL reformulation (obtained from any state-of-the-art technique) and joining the corresponding (reformulated) subqueries, may yield a FOL reformulation for the query to answer. Not every cover of a query leads to a FOL reformulation; but every cover which does, yields an alternative cover-based FOL reformulation of the original query. Crucially for our problem, a smart cover choice may lead to a cover-based reformulation whose evaluation is more e cient. Thus, the cover-based technique amounts to circumventing the di culty of modern RDBMSs to e ciently evaluate FOL reformulations in general.

Problem 1 (Optimization problem). Given a CQ q and a description logic KB K, the cost-driven cover-based query answering problem consists of nding a coverbased reformulation of q based on K with lowest (estimated) evaluation cost.

We solve this problem for DL-LiteR in two steps. First, we provide a su cient condition for a cover to be safe for query answering, i.e., to lead to a cover-based FOL reformulation. The main idea for this condition is to have a cautious approximation of the query atoms which are interdependent w.r.t. reformulation, i.e., which (directly or after specialization) unify through state-of-the-art reformulation techniques, and keep them in the same cover fragment. The space of all covers of a query q satisfying this condition is denoted Lq; all Lq covers turn out to correspond to some fusion of fragments from a certain root cover we denote Croot. We also re ne our su cient condition to identify an extended space of covers Eq, which includes Lq and also leads to FOL reformulations of q.

Second, based on a function estimating the evaluation cost of a given FOL query through an RDBMS, we devise two cover search algorithms. The rst one, termed EC-DL (Exhaustive Covers), starts from Croot and explores all Eq covers in the case of DL-LiteR. The second one, named GC-DL (Greedy Covers), also starts from Croot but explores Eq partially, in greedy fashion. It uses an explored cover set initialized with fCrootg, from which it picks a cover C inducing a qFOL reformulation with minimum cost (C), and attempts to build from C a cover C0, by fusing two fragments, or adding (copying) an atom to a fragment. GC-DL only adds C0 to the explored set if (C0) < (C), thus it only explores a small part of the search space. Both algorithms return a cover-based reformulation with the minimum estimated cost w.r.t. the explored space. When fusing two fragments into one, or adding an atom to a fragment, (C) decreases if the new fragment is more selective than the fragment(s) it replaces. Therefore, the RDBMS may nd a more e cient way to evaluate the query of this new fragment, and/or its result may be smaller, making the evaluation of qFOL based on the new cover C0 faster. We implemented our cover-based approach in Java 7, on top of PostgreSQL v9.3.2. We used the LUBM290 DL-LiteR TBox and associated EUDG data generator [ 3 ]: LUBM290 consists of 34 roles, 128 concepts and 212 constraints; the generated ABox comprises 15 million facts. We chose RAPID [ 2 ] for CQ-to-UCQ (unions of CQs) reformulation. For , we used Postgres' own estimation, obtained using the explain directive. We devised a set of 13 CQs, ranging from 2 to 10 atoms (5:77 on average); their UCQ reformulations have 35 to 667 CQs (290:2 on average).

Figure 1(a) depicts the evaluation time through Postgres, of four FOL reformulations: (i) the UCQ produced by RAPID [ 2 ]; (ii) the JUCQ (joins of UCQs) reformulation based on Croot; (iii) the JUCQ reformulation corresponding to the best-performing cover found by our algorithm EC-DL, and (iv) the JUCQ reformulation based on the best-performing cover found by GC-DL. First, the gure shows that xed FOL reformulations are not e ciently evaluated, e.g., UCQ for Q1, Q5 and Q9-Q11, and the one based on Croot for Q6-Q8 and Q13. This poor performance correlates with the large size of the UCQ reformulations: such very large unions of CQs are very poorly handled by current RDBMS optimizers, which are designed and tuned for small CQs. Second, the reformulation based on the cover returned by EC-DL is always more e cient than UCQ reformulation (more than one order of magnitude for Q5), respectively, Croot-based reformulation (up to a factor of 230 for Q6). Third, in our experiments, the GC-DL-chosen cover leads to a JUCQ reformulation as e cient as the EC-DL one, demonstrating that even a partial, greedy cover search leads to good performance (this cannot be guaranteed in general). For Q7 and Q9-Q13, the best cover we found is safe; for all the others, this is not the case, con rming the interest of the larger space Eq.

Figure 1(b) depicts the running time of the EC-DL and GC-DL algorithms, which can be seen as the overhead of our cover-based technique. The time is very small, between 2 ms (Q11-Q13, with just 2 atoms) and 221 ms (EC-DL on Q10, of 10 atoms). The time is higher for more complex queries, but these are precisely the cases where our techniques are most bene cial, e.g., for Q10, EC-DL runs in less than 2% of the time to evaluate the UCQ reformulation, while the cover we recommend is more than 4 times faster than UCQ. As expected, GC-DL is faster than EC-DL due to the exploration of less covers. Together, Figure 1(a) and 1(b) con rm the bene ts and practical interest of our cost-based cover search. Acknowledgements This work has been partially funded by the Programme Investissement d'Avenir Datalyse project and the ANR PAGODA project.

1. Calvanese , D. , Giacomo , G.D. , Lembo , D. , Lenzerini , M. , Rosati , R. : Tractable reasoning and e cient query answering in description logics: The DL-Lite family . JAR 39 ( 3 ), 385 { 429 ( 2007 )

2. Chortaras , A. , Trivela , D. , Stamou , G.B. : Optimized query rewriting for OWL 2 QL . In: CADE ( 2011 )

3. Lutz , C. , Seylan , I. , Toman , D. , Wolter , F. : The combined approach to OBDA: taming role hierarchies using lters . In: ISWC . pp. 314 { 330 ( 2013 )

4. Ono , K. , Lohman , G.M. : Measuring the complexity of join enumeration in query optimization . In: VLDB ( 1990 )

5. Perez-Urbina , H. , Horrocks , I. , Motik , B. : E cient query answering for OWL 2 . In: ISWC ( 2009 )

6. Rosati , R. , Almatelli , A. : Improving query answering over DL-Lite ontologies . In: KR ( 2010 )

7. Venetis , T. , Stoilos , G. , Stamou , G.B. : Incremental query rewriting for OWL 2 QL . In: Description Logics ( 2012 )